Details
Originalsprache | Englisch |
---|---|
Titel des Sammelwerks | Advances in Multimedia Modeling |
Untertitel | 18th International Conference, MMM 2012, Proceedings |
Seiten | 40-50 |
Seitenumfang | 11 |
Publikationsstatus | Veröffentlicht - 2012 |
Extern publiziert | Ja |
Veranstaltung | 18th International Conference on Multimedia Modeling, MMM 2012 - Klagenfurt, Österreich Dauer: 4 Jan. 2012 → 6 Jan. 2012 |
Publikationsreihe
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Band | 7131 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (elektronisch) | 1611-3349 |
Abstract
State-of-the-art systems for video concept detection mainly rely on visual features. Some previous approaches have also included audio features, either using low-level features such as mel-frequency cepstral coefficients (MFCC) or exploiting the detection of specific audio concepts. In this paper, we investigate a bag of auditory words (BoAW) approach that models MFCC features in an auditory vocabulary. The resulting BoAW features are combined with state-of-the-art visual features via multiple kernel learning (MKL). Experiments on a large set of 101 video concepts from the MediaMill Challenge show the effectiveness of using BoAW features: The system using BoAW features and a support vector machine with a χ 2-kernel is superior to a state-of-the-art audio approach relying on probabilistic latent semantic indexing. Furthermore, it is shown that an early fusion approach degrades detection performance, whereas the combination of auditory and visual bag of words features via MKL yields a relative performance improvement of 9%.
ASJC Scopus Sachgebiete
- Mathematik (insg.)
- Theoretische Informatik
- Informatik (insg.)
- Allgemeine Computerwissenschaft
Zitieren
- Standard
- Harvard
- Apa
- Vancouver
- BibTex
- RIS
Advances in Multimedia Modeling : 18th International Conference, MMM 2012, Proceedings. 2012. S. 40-50 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 7131 LNCS).
Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review
}
TY - GEN
T1 - Multimodal video concept detection via bag of auditory words and multiple kernel learning
AU - Mühling, Markus
AU - Ewerth, Ralph
AU - Zhou, Jun
AU - Freisleben, Bernd
PY - 2012
Y1 - 2012
N2 - State-of-the-art systems for video concept detection mainly rely on visual features. Some previous approaches have also included audio features, either using low-level features such as mel-frequency cepstral coefficients (MFCC) or exploiting the detection of specific audio concepts. In this paper, we investigate a bag of auditory words (BoAW) approach that models MFCC features in an auditory vocabulary. The resulting BoAW features are combined with state-of-the-art visual features via multiple kernel learning (MKL). Experiments on a large set of 101 video concepts from the MediaMill Challenge show the effectiveness of using BoAW features: The system using BoAW features and a support vector machine with a χ 2-kernel is superior to a state-of-the-art audio approach relying on probabilistic latent semantic indexing. Furthermore, it is shown that an early fusion approach degrades detection performance, whereas the combination of auditory and visual bag of words features via MKL yields a relative performance improvement of 9%.
AB - State-of-the-art systems for video concept detection mainly rely on visual features. Some previous approaches have also included audio features, either using low-level features such as mel-frequency cepstral coefficients (MFCC) or exploiting the detection of specific audio concepts. In this paper, we investigate a bag of auditory words (BoAW) approach that models MFCC features in an auditory vocabulary. The resulting BoAW features are combined with state-of-the-art visual features via multiple kernel learning (MKL). Experiments on a large set of 101 video concepts from the MediaMill Challenge show the effectiveness of using BoAW features: The system using BoAW features and a support vector machine with a χ 2-kernel is superior to a state-of-the-art audio approach relying on probabilistic latent semantic indexing. Furthermore, it is shown that an early fusion approach degrades detection performance, whereas the combination of auditory and visual bag of words features via MKL yields a relative performance improvement of 9%.
KW - audio codebook
KW - bag of auditory words
KW - bag of words
KW - multiple kernel learning
KW - video retrieval
KW - Visual concept detection
UR - http://www.scopus.com/inward/record.url?scp=84862949691&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-27355-1_7
DO - 10.1007/978-3-642-27355-1_7
M3 - Conference contribution
AN - SCOPUS:84862949691
SN - 9783642273544
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 40
EP - 50
BT - Advances in Multimedia Modeling
T2 - 18th International Conference on Multimedia Modeling, MMM 2012
Y2 - 4 January 2012 through 6 January 2012
ER -