Unsupervised Video Summarization via Multi-source Features

Hussain Kanafani; Junaid Ahmed Ghauri; Sherzod Hakimov; Ralph Ewerth

doi:10.48550/arXiv.2105.12532

Details

Original language	English
Title of host publication	ICMR 2021
Subtitle of host publication	Proceedings of the 2021 International Conference on Multimedia Retrieval
Pages	466-470
Number of pages	5
ISBN (electronic)	9781450384636
Publication status	Published - 1 Sept 2021
Event	11th ACM International Conference on Multimedia Retrieval, ICMR 2021 - Taipei, Taiwan Duration: 16 Nov 2021 → 19 Nov 2021

Publication series

Name	ICMR 2021 - Proceedings of the 2021 International Conference on Multimedia Retrieval

Abstract

Video summarization aims at generating a compact yet representative visual summary that conveys the essence of the original video. The advantage of unsupervised approaches is that they do not require human annotations to learn the summarization capability and generalize to a wider range of domains. Previous work relies on the same type of deep features, typically based on a model pre-trained on ImageNet data. Therefore, we propose to incorporate multiple feature sources with chunk and stride fusion to provide more information about the visual content. For a comprehensive evaluation on the two benchmarks TVSum and SumMe, we compare our method with four state-of-the-art approaches. Two of these approaches were implemented by ourselves to reproduce the reported results. Our evaluation shows that we obtain state-of-the-art results on both datasets while also highlighting the shortcomings of previous work with regard to the evaluation methodology. Finally, we perform error analysis on videos for the two benchmark datasets to summarize and spot the factors that lead to misclassifications.

Keywords

Deep learning, Multi-source combination, Multi-source fusion, Unsupervised video summarization, Video analysis

ASJC Scopus subject areas

Computer Science(all)
Computer Graphics and Computer-Aided Design
Computer Science(all)
Computer Science Applications
Computer Science(all)
Computer Vision and Pattern Recognition
Computer Science(all)
Human-Computer Interaction
Computer Science(all)
Software

Cite this

Unsupervised Video Summarization via Multi-source Features. / Kanafani, Hussain; Ghauri, Junaid Ahmed; Hakimov, Sherzod et al.
ICMR 2021: Proceedings of the 2021 International Conference on Multimedia Retrieval. 2021. p. 466-470 (ICMR 2021 - Proceedings of the 2021 International Conference on Multimedia Retrieval).

Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review

Kanafani, H, Ghauri, JA, Hakimov, S & Ewerth, R 2021, Unsupervised Video Summarization via Multi-source Features. in ICMR 2021: Proceedings of the 2021 International Conference on Multimedia Retrieval. ICMR 2021 - Proceedings of the 2021 International Conference on Multimedia Retrieval, pp. 466-470, 11th ACM International Conference on Multimedia Retrieval, ICMR 2021, Taipei, Taiwan, 16 Nov 2021. https://doi.org/10.48550/arXiv.2105.12532, https://doi.org/10.1145/3460426.3463597

Kanafani, H., Ghauri, J. A., Hakimov, S., & Ewerth, R. (2021). Unsupervised Video Summarization via Multi-source Features. In ICMR 2021: Proceedings of the 2021 International Conference on Multimedia Retrieval (pp. 466-470). (ICMR 2021 - Proceedings of the 2021 International Conference on Multimedia Retrieval). https://doi.org/10.48550/arXiv.2105.12532, https://doi.org/10.1145/3460426.3463597

Kanafani H, Ghauri JA, Hakimov S, Ewerth R. Unsupervised Video Summarization via Multi-source Features. In ICMR 2021: Proceedings of the 2021 International Conference on Multimedia Retrieval. 2021. p. 466-470. (ICMR 2021 - Proceedings of the 2021 International Conference on Multimedia Retrieval). doi: 10.48550/arXiv.2105.12532, 10.1145/3460426.3463597

Kanafani, Hussain ; Ghauri, Junaid Ahmed ; Hakimov, Sherzod et al. / Unsupervised Video Summarization via Multi-source Features. ICMR 2021: Proceedings of the 2021 International Conference on Multimedia Retrieval. 2021. pp. 466-470 (ICMR 2021 - Proceedings of the 2021 International Conference on Multimedia Retrieval).

Download

@inproceedings{0c8637f956bc49e0bba75d033c4a3edf,

title = "Unsupervised Video Summarization via Multi-source Features",

abstract = "Video summarization aims at generating a compact yet representative visual summary that conveys the essence of the original video. The advantage of unsupervised approaches is that they do not require human annotations to learn the summarization capability and generalize to a wider range of domains. Previous work relies on the same type of deep features, typically based on a model pre-trained on ImageNet data. Therefore, we propose to incorporate multiple feature sources with chunk and stride fusion to provide more information about the visual content. For a comprehensive evaluation on the two benchmarks TVSum and SumMe, we compare our method with four state-of-the-art approaches. Two of these approaches were implemented by ourselves to reproduce the reported results. Our evaluation shows that we obtain state-of-the-art results on both datasets while also highlighting the shortcomings of previous work with regard to the evaluation methodology. Finally, we perform error analysis on videos for the two benchmark datasets to summarize and spot the factors that lead to misclassifications.",

keywords = "Deep learning, Multi-source combination, Multi-source fusion, Unsupervised video summarization, Video analysis",

author = "Hussain Kanafani and Ghauri, {Junaid Ahmed} and Sherzod Hakimov and Ralph Ewerth",

year = "2021",

month = sep,

day = "1",

doi = "10.48550/arXiv.2105.12532",

language = "English",

series = "ICMR 2021 - Proceedings of the 2021 International Conference on Multimedia Retrieval",

pages = "466--470",

booktitle = "ICMR 2021",

note = "11th ACM International Conference on Multimedia Retrieval, ICMR 2021 ; Conference date: 16-11-2021 Through 19-11-2021",

}

Download

TY - GEN

T1 - Unsupervised Video Summarization via Multi-source Features

AU - Kanafani, Hussain

AU - Ghauri, Junaid Ahmed

AU - Hakimov, Sherzod

AU - Ewerth, Ralph

PY - 2021/9/1

Y1 - 2021/9/1

N2 - Video summarization aims at generating a compact yet representative visual summary that conveys the essence of the original video. The advantage of unsupervised approaches is that they do not require human annotations to learn the summarization capability and generalize to a wider range of domains. Previous work relies on the same type of deep features, typically based on a model pre-trained on ImageNet data. Therefore, we propose to incorporate multiple feature sources with chunk and stride fusion to provide more information about the visual content. For a comprehensive evaluation on the two benchmarks TVSum and SumMe, we compare our method with four state-of-the-art approaches. Two of these approaches were implemented by ourselves to reproduce the reported results. Our evaluation shows that we obtain state-of-the-art results on both datasets while also highlighting the shortcomings of previous work with regard to the evaluation methodology. Finally, we perform error analysis on videos for the two benchmark datasets to summarize and spot the factors that lead to misclassifications.

AB - Video summarization aims at generating a compact yet representative visual summary that conveys the essence of the original video. The advantage of unsupervised approaches is that they do not require human annotations to learn the summarization capability and generalize to a wider range of domains. Previous work relies on the same type of deep features, typically based on a model pre-trained on ImageNet data. Therefore, we propose to incorporate multiple feature sources with chunk and stride fusion to provide more information about the visual content. For a comprehensive evaluation on the two benchmarks TVSum and SumMe, we compare our method with four state-of-the-art approaches. Two of these approaches were implemented by ourselves to reproduce the reported results. Our evaluation shows that we obtain state-of-the-art results on both datasets while also highlighting the shortcomings of previous work with regard to the evaluation methodology. Finally, we perform error analysis on videos for the two benchmark datasets to summarize and spot the factors that lead to misclassifications.

KW - Deep learning

KW - Multi-source combination

KW - Multi-source fusion

KW - Unsupervised video summarization

KW - Video analysis

UR - http://www.scopus.com/inward/record.url?scp=85114886998&partnerID=8YFLogxK

U2 - 10.48550/arXiv.2105.12532

DO - 10.48550/arXiv.2105.12532

M3 - Conference contribution

AN - SCOPUS:85114886998

T3 - ICMR 2021 - Proceedings of the 2021 International Conference on Multimedia Retrieval

SP - 466

EP - 470

BT - ICMR 2021

T2 - 11th ACM International Conference on Multimedia Retrieval, ICMR 2021

Y2 - 16 November 2021 through 19 November 2021

ER -

Research@Leibniz University

Unsupervised Video Summarization via Multi-source Features

Authors

Research Organisations

External Research Organisations