Supervised video summarization via multiple feature sets with parallel attention

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Autorschaft

  • Junaid Ahmed Ghauri
  • Sherzod Hakimov
  • Ralph Ewerth

Organisationseinheiten

Externe Organisationen

  • Technische Informationsbibliothek (TIB) Leibniz-Informationszentrum Technik und Naturwissenschaften und Universitätsbibliothek
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Titel des Sammelwerks2021 IEEE International Conference on Multimedia and Expo
UntertitelICME 2021
Herausgeber (Verlag)IEEE Computer Society
ISBN (elektronisch)9781665438643
PublikationsstatusVeröffentlicht - 2021
Veranstaltung2021 IEEE International Conference on Multimedia and Expo, ICME 2021 - Shenzhen, China
Dauer: 5 Juli 20219 Juli 2021

Publikationsreihe

NameProceedings - IEEE International Conference on Multimedia and Expo
ISSN (Print)1945-7871
ISSN (elektronisch)1945-788X

Abstract

The assignment of importance scores to particular frames or (short) segments in a video is crucial for summarization, but also a difficult task. Previous work utilizes only one source of visual features. In this paper, we suggest a novel model architecture that combines three feature sets for visual content and motion to predict importance scores. The proposed architecture utilizes an attention mechanism before fusing motion features and features representing the (static) visual content, i.e., derived from an image classification model. Comprehensive experimental evaluations are reported for two well-known datasets, SumMe and TVSum. In this context, we identify methodological issues on how previous work used these benchmark datasets, and present a fair evaluation scheme with appropriate data splits that can be used in future work. When using static and motion features with the parallel attention mechanism, we improve state-of-the-art results for SumMe, while being on par with the state of the art for TVSum dataset.

ASJC Scopus Sachgebiete

Zitieren

Supervised video summarization via multiple feature sets with parallel attention. / Ghauri, Junaid Ahmed; Hakimov, Sherzod; Ewerth, Ralph.
2021 IEEE International Conference on Multimedia and Expo: ICME 2021. IEEE Computer Society, 2021. (Proceedings - IEEE International Conference on Multimedia and Expo).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Ghauri, JA, Hakimov, S & Ewerth, R 2021, Supervised video summarization via multiple feature sets with parallel attention. in 2021 IEEE International Conference on Multimedia and Expo: ICME 2021. Proceedings - IEEE International Conference on Multimedia and Expo, IEEE Computer Society, 2021 IEEE International Conference on Multimedia and Expo, ICME 2021, Shenzhen, China, 5 Juli 2021. https://doi.org/10.48550/arXiv.2104.11530, https://doi.org/10.1109/ICME51207.2021.9428318
Ghauri, J. A., Hakimov, S., & Ewerth, R. (2021). Supervised video summarization via multiple feature sets with parallel attention. In 2021 IEEE International Conference on Multimedia and Expo: ICME 2021 (Proceedings - IEEE International Conference on Multimedia and Expo). IEEE Computer Society. https://doi.org/10.48550/arXiv.2104.11530, https://doi.org/10.1109/ICME51207.2021.9428318
Ghauri JA, Hakimov S, Ewerth R. Supervised video summarization via multiple feature sets with parallel attention. in 2021 IEEE International Conference on Multimedia and Expo: ICME 2021. IEEE Computer Society. 2021. (Proceedings - IEEE International Conference on Multimedia and Expo). doi: 10.48550/arXiv.2104.11530, 10.1109/ICME51207.2021.9428318
Ghauri, Junaid Ahmed ; Hakimov, Sherzod ; Ewerth, Ralph. / Supervised video summarization via multiple feature sets with parallel attention. 2021 IEEE International Conference on Multimedia and Expo: ICME 2021. IEEE Computer Society, 2021. (Proceedings - IEEE International Conference on Multimedia and Expo).
Download
@inproceedings{40502e7cac3f487ab910646bb91ad9de,
title = "Supervised video summarization via multiple feature sets with parallel attention",
abstract = "The assignment of importance scores to particular frames or (short) segments in a video is crucial for summarization, but also a difficult task. Previous work utilizes only one source of visual features. In this paper, we suggest a novel model architecture that combines three feature sets for visual content and motion to predict importance scores. The proposed architecture utilizes an attention mechanism before fusing motion features and features representing the (static) visual content, i.e., derived from an image classification model. Comprehensive experimental evaluations are reported for two well-known datasets, SumMe and TVSum. In this context, we identify methodological issues on how previous work used these benchmark datasets, and present a fair evaluation scheme with appropriate data splits that can be used in future work. When using static and motion features with the parallel attention mechanism, we improve state-of-the-art results for SumMe, while being on par with the state of the art for TVSum dataset.",
keywords = "attention mechanism, motion features, supervised video summarization, visual attention",
author = "Ghauri, {Junaid Ahmed} and Sherzod Hakimov and Ralph Ewerth",
year = "2021",
doi = "10.48550/arXiv.2104.11530",
language = "English",
series = "Proceedings - IEEE International Conference on Multimedia and Expo",
publisher = "IEEE Computer Society",
booktitle = "2021 IEEE International Conference on Multimedia and Expo",
address = "United States",
note = "2021 IEEE International Conference on Multimedia and Expo, ICME 2021 ; Conference date: 05-07-2021 Through 09-07-2021",

}

Download

TY - GEN

T1 - Supervised video summarization via multiple feature sets with parallel attention

AU - Ghauri, Junaid Ahmed

AU - Hakimov, Sherzod

AU - Ewerth, Ralph

PY - 2021

Y1 - 2021

N2 - The assignment of importance scores to particular frames or (short) segments in a video is crucial for summarization, but also a difficult task. Previous work utilizes only one source of visual features. In this paper, we suggest a novel model architecture that combines three feature sets for visual content and motion to predict importance scores. The proposed architecture utilizes an attention mechanism before fusing motion features and features representing the (static) visual content, i.e., derived from an image classification model. Comprehensive experimental evaluations are reported for two well-known datasets, SumMe and TVSum. In this context, we identify methodological issues on how previous work used these benchmark datasets, and present a fair evaluation scheme with appropriate data splits that can be used in future work. When using static and motion features with the parallel attention mechanism, we improve state-of-the-art results for SumMe, while being on par with the state of the art for TVSum dataset.

AB - The assignment of importance scores to particular frames or (short) segments in a video is crucial for summarization, but also a difficult task. Previous work utilizes only one source of visual features. In this paper, we suggest a novel model architecture that combines three feature sets for visual content and motion to predict importance scores. The proposed architecture utilizes an attention mechanism before fusing motion features and features representing the (static) visual content, i.e., derived from an image classification model. Comprehensive experimental evaluations are reported for two well-known datasets, SumMe and TVSum. In this context, we identify methodological issues on how previous work used these benchmark datasets, and present a fair evaluation scheme with appropriate data splits that can be used in future work. When using static and motion features with the parallel attention mechanism, we improve state-of-the-art results for SumMe, while being on par with the state of the art for TVSum dataset.

KW - attention mechanism

KW - motion features

KW - supervised video summarization

KW - visual attention

UR - http://www.scopus.com/inward/record.url?scp=85119321940&partnerID=8YFLogxK

U2 - 10.48550/arXiv.2104.11530

DO - 10.48550/arXiv.2104.11530

M3 - Conference contribution

AN - SCOPUS:85119321940

T3 - Proceedings - IEEE International Conference on Multimedia and Expo

BT - 2021 IEEE International Conference on Multimedia and Expo

PB - IEEE Computer Society

T2 - 2021 IEEE International Conference on Multimedia and Expo, ICME 2021

Y2 - 5 July 2021 through 9 July 2021

ER -