Details
Originalsprache | Englisch |
---|---|
Titel des Sammelwerks | 2021 IEEE International Conference on Multimedia and Expo |
Untertitel | ICME 2021 |
Herausgeber (Verlag) | IEEE Computer Society |
ISBN (elektronisch) | 9781665438643 |
Publikationsstatus | Veröffentlicht - 2021 |
Veranstaltung | 2021 IEEE International Conference on Multimedia and Expo, ICME 2021 - Shenzhen, China Dauer: 5 Juli 2021 → 9 Juli 2021 |
Publikationsreihe
Name | Proceedings - IEEE International Conference on Multimedia and Expo |
---|---|
ISSN (Print) | 1945-7871 |
ISSN (elektronisch) | 1945-788X |
Abstract
The assignment of importance scores to particular frames or (short) segments in a video is crucial for summarization, but also a difficult task. Previous work utilizes only one source of visual features. In this paper, we suggest a novel model architecture that combines three feature sets for visual content and motion to predict importance scores. The proposed architecture utilizes an attention mechanism before fusing motion features and features representing the (static) visual content, i.e., derived from an image classification model. Comprehensive experimental evaluations are reported for two well-known datasets, SumMe and TVSum. In this context, we identify methodological issues on how previous work used these benchmark datasets, and present a fair evaluation scheme with appropriate data splits that can be used in future work. When using static and motion features with the parallel attention mechanism, we improve state-of-the-art results for SumMe, while being on par with the state of the art for TVSum dataset.
ASJC Scopus Sachgebiete
- Informatik (insg.)
- Computernetzwerke und -kommunikation
- Informatik (insg.)
- Angewandte Informatik
Zitieren
- Standard
- Harvard
- Apa
- Vancouver
- BibTex
- RIS
2021 IEEE International Conference on Multimedia and Expo: ICME 2021. IEEE Computer Society, 2021. (Proceedings - IEEE International Conference on Multimedia and Expo).
Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review
}
TY - GEN
T1 - Supervised video summarization via multiple feature sets with parallel attention
AU - Ghauri, Junaid Ahmed
AU - Hakimov, Sherzod
AU - Ewerth, Ralph
PY - 2021
Y1 - 2021
N2 - The assignment of importance scores to particular frames or (short) segments in a video is crucial for summarization, but also a difficult task. Previous work utilizes only one source of visual features. In this paper, we suggest a novel model architecture that combines three feature sets for visual content and motion to predict importance scores. The proposed architecture utilizes an attention mechanism before fusing motion features and features representing the (static) visual content, i.e., derived from an image classification model. Comprehensive experimental evaluations are reported for two well-known datasets, SumMe and TVSum. In this context, we identify methodological issues on how previous work used these benchmark datasets, and present a fair evaluation scheme with appropriate data splits that can be used in future work. When using static and motion features with the parallel attention mechanism, we improve state-of-the-art results for SumMe, while being on par with the state of the art for TVSum dataset.
AB - The assignment of importance scores to particular frames or (short) segments in a video is crucial for summarization, but also a difficult task. Previous work utilizes only one source of visual features. In this paper, we suggest a novel model architecture that combines three feature sets for visual content and motion to predict importance scores. The proposed architecture utilizes an attention mechanism before fusing motion features and features representing the (static) visual content, i.e., derived from an image classification model. Comprehensive experimental evaluations are reported for two well-known datasets, SumMe and TVSum. In this context, we identify methodological issues on how previous work used these benchmark datasets, and present a fair evaluation scheme with appropriate data splits that can be used in future work. When using static and motion features with the parallel attention mechanism, we improve state-of-the-art results for SumMe, while being on par with the state of the art for TVSum dataset.
KW - attention mechanism
KW - motion features
KW - supervised video summarization
KW - visual attention
UR - http://www.scopus.com/inward/record.url?scp=85119321940&partnerID=8YFLogxK
U2 - 10.48550/arXiv.2104.11530
DO - 10.48550/arXiv.2104.11530
M3 - Conference contribution
AN - SCOPUS:85119321940
T3 - Proceedings - IEEE International Conference on Multimedia and Expo
BT - 2021 IEEE International Conference on Multimedia and Expo
PB - IEEE Computer Society
T2 - 2021 IEEE International Conference on Multimedia and Expo, ICME 2021
Y2 - 5 July 2021 through 9 July 2021
ER -