Unsupervised Video Summarization via Multi-source Features

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

  • Hussain Kanafani
  • Junaid Ahmed Ghauri
  • Sherzod Hakimov
  • Ralph Ewerth

Research Organisations

External Research Organisations

  • German National Library of Science and Technology (TIB)
View graph of relations

Details

Original languageEnglish
Title of host publicationICMR 2021
Subtitle of host publicationProceedings of the 2021 International Conference on Multimedia Retrieval
Pages466-470
Number of pages5
ISBN (electronic)9781450384636
Publication statusPublished - 1 Sept 2021
Event11th ACM International Conference on Multimedia Retrieval, ICMR 2021 - Taipei, Taiwan
Duration: 16 Nov 202119 Nov 2021

Publication series

NameICMR 2021 - Proceedings of the 2021 International Conference on Multimedia Retrieval

Abstract

Video summarization aims at generating a compact yet representative visual summary that conveys the essence of the original video. The advantage of unsupervised approaches is that they do not require human annotations to learn the summarization capability and generalize to a wider range of domains. Previous work relies on the same type of deep features, typically based on a model pre-trained on ImageNet data. Therefore, we propose to incorporate multiple feature sources with chunk and stride fusion to provide more information about the visual content. For a comprehensive evaluation on the two benchmarks TVSum and SumMe, we compare our method with four state-of-the-art approaches. Two of these approaches were implemented by ourselves to reproduce the reported results. Our evaluation shows that we obtain state-of-the-art results on both datasets while also highlighting the shortcomings of previous work with regard to the evaluation methodology. Finally, we perform error analysis on videos for the two benchmark datasets to summarize and spot the factors that lead to misclassifications.

Keywords

    Deep learning, Multi-source combination, Multi-source fusion, Unsupervised video summarization, Video analysis

ASJC Scopus subject areas

Cite this

Unsupervised Video Summarization via Multi-source Features. / Kanafani, Hussain; Ghauri, Junaid Ahmed; Hakimov, Sherzod et al.
ICMR 2021: Proceedings of the 2021 International Conference on Multimedia Retrieval. 2021. p. 466-470 (ICMR 2021 - Proceedings of the 2021 International Conference on Multimedia Retrieval).

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Kanafani, H, Ghauri, JA, Hakimov, S & Ewerth, R 2021, Unsupervised Video Summarization via Multi-source Features. in ICMR 2021: Proceedings of the 2021 International Conference on Multimedia Retrieval. ICMR 2021 - Proceedings of the 2021 International Conference on Multimedia Retrieval, pp. 466-470, 11th ACM International Conference on Multimedia Retrieval, ICMR 2021, Taipei, Taiwan, 16 Nov 2021. https://doi.org/10.48550/arXiv.2105.12532, https://doi.org/10.1145/3460426.3463597
Kanafani, H., Ghauri, J. A., Hakimov, S., & Ewerth, R. (2021). Unsupervised Video Summarization via Multi-source Features. In ICMR 2021: Proceedings of the 2021 International Conference on Multimedia Retrieval (pp. 466-470). (ICMR 2021 - Proceedings of the 2021 International Conference on Multimedia Retrieval). https://doi.org/10.48550/arXiv.2105.12532, https://doi.org/10.1145/3460426.3463597
Kanafani H, Ghauri JA, Hakimov S, Ewerth R. Unsupervised Video Summarization via Multi-source Features. In ICMR 2021: Proceedings of the 2021 International Conference on Multimedia Retrieval. 2021. p. 466-470. (ICMR 2021 - Proceedings of the 2021 International Conference on Multimedia Retrieval). doi: 10.48550/arXiv.2105.12532, 10.1145/3460426.3463597
Kanafani, Hussain ; Ghauri, Junaid Ahmed ; Hakimov, Sherzod et al. / Unsupervised Video Summarization via Multi-source Features. ICMR 2021: Proceedings of the 2021 International Conference on Multimedia Retrieval. 2021. pp. 466-470 (ICMR 2021 - Proceedings of the 2021 International Conference on Multimedia Retrieval).
Download
@inproceedings{0c8637f956bc49e0bba75d033c4a3edf,
title = "Unsupervised Video Summarization via Multi-source Features",
abstract = "Video summarization aims at generating a compact yet representative visual summary that conveys the essence of the original video. The advantage of unsupervised approaches is that they do not require human annotations to learn the summarization capability and generalize to a wider range of domains. Previous work relies on the same type of deep features, typically based on a model pre-trained on ImageNet data. Therefore, we propose to incorporate multiple feature sources with chunk and stride fusion to provide more information about the visual content. For a comprehensive evaluation on the two benchmarks TVSum and SumMe, we compare our method with four state-of-the-art approaches. Two of these approaches were implemented by ourselves to reproduce the reported results. Our evaluation shows that we obtain state-of-the-art results on both datasets while also highlighting the shortcomings of previous work with regard to the evaluation methodology. Finally, we perform error analysis on videos for the two benchmark datasets to summarize and spot the factors that lead to misclassifications.",
keywords = "Deep learning, Multi-source combination, Multi-source fusion, Unsupervised video summarization, Video analysis",
author = "Hussain Kanafani and Ghauri, {Junaid Ahmed} and Sherzod Hakimov and Ralph Ewerth",
year = "2021",
month = sep,
day = "1",
doi = "10.48550/arXiv.2105.12532",
language = "English",
series = "ICMR 2021 - Proceedings of the 2021 International Conference on Multimedia Retrieval",
pages = "466--470",
booktitle = "ICMR 2021",
note = "11th ACM International Conference on Multimedia Retrieval, ICMR 2021 ; Conference date: 16-11-2021 Through 19-11-2021",

}

Download

TY - GEN

T1 - Unsupervised Video Summarization via Multi-source Features

AU - Kanafani, Hussain

AU - Ghauri, Junaid Ahmed

AU - Hakimov, Sherzod

AU - Ewerth, Ralph

PY - 2021/9/1

Y1 - 2021/9/1

N2 - Video summarization aims at generating a compact yet representative visual summary that conveys the essence of the original video. The advantage of unsupervised approaches is that they do not require human annotations to learn the summarization capability and generalize to a wider range of domains. Previous work relies on the same type of deep features, typically based on a model pre-trained on ImageNet data. Therefore, we propose to incorporate multiple feature sources with chunk and stride fusion to provide more information about the visual content. For a comprehensive evaluation on the two benchmarks TVSum and SumMe, we compare our method with four state-of-the-art approaches. Two of these approaches were implemented by ourselves to reproduce the reported results. Our evaluation shows that we obtain state-of-the-art results on both datasets while also highlighting the shortcomings of previous work with regard to the evaluation methodology. Finally, we perform error analysis on videos for the two benchmark datasets to summarize and spot the factors that lead to misclassifications.

AB - Video summarization aims at generating a compact yet representative visual summary that conveys the essence of the original video. The advantage of unsupervised approaches is that they do not require human annotations to learn the summarization capability and generalize to a wider range of domains. Previous work relies on the same type of deep features, typically based on a model pre-trained on ImageNet data. Therefore, we propose to incorporate multiple feature sources with chunk and stride fusion to provide more information about the visual content. For a comprehensive evaluation on the two benchmarks TVSum and SumMe, we compare our method with four state-of-the-art approaches. Two of these approaches were implemented by ourselves to reproduce the reported results. Our evaluation shows that we obtain state-of-the-art results on both datasets while also highlighting the shortcomings of previous work with regard to the evaluation methodology. Finally, we perform error analysis on videos for the two benchmark datasets to summarize and spot the factors that lead to misclassifications.

KW - Deep learning

KW - Multi-source combination

KW - Multi-source fusion

KW - Unsupervised video summarization

KW - Video analysis

UR - http://www.scopus.com/inward/record.url?scp=85114886998&partnerID=8YFLogxK

U2 - 10.48550/arXiv.2105.12532

DO - 10.48550/arXiv.2105.12532

M3 - Conference contribution

AN - SCOPUS:85114886998

T3 - ICMR 2021 - Proceedings of the 2021 International Conference on Multimedia Retrieval

SP - 466

EP - 470

BT - ICMR 2021

T2 - 11th ACM International Conference on Multimedia Retrieval, ICMR 2021

Y2 - 16 November 2021 through 19 November 2021

ER -