Deep learning for content-based video retrieval in film and television production

Markus Mühling; Nikolaus Korfhage; Christian Otto; Matthias Springstein; Thomas Langelage; Uli Veith; Ralph Ewerth; Bernd Freisleben; Eric Müller-Budack

doi:10.1007/s11042-017-4962-9

Details

Originalsprache	Englisch
Seiten (von - bis)	22169-22194
Seitenumfang	26
Fachzeitschrift	Multimedia tools and applications
Jahrgang	76
Ausgabenummer	21
Publikationsstatus	Veröffentlicht - 5 Juli 2017

Abstract

While digitization has changed the workflow of professional media production, the content-based labeling of image sequences and video footage, necessary for all subsequent stages of film and television production, archival or marketing is typically still performed manually and thus quite time-consuming. In this paper, we present deep learning approaches to support professional media production. In particular, novel algorithms for visual concept detection, similarity search, face detection, face recognition and face clustering are combined in a multimedia tool for effective video inspection and retrieval. The analysis algorithms for concept detection and similarity search are combined in a multi-task learning approach to share network weights, saving almost half of the computation time. Furthermore, a new visual concept lexicon tailored to fast video retrieval for media production and novel visualization components are introduced. Experimental results show the quality of the proposed approaches. For example, concept detection achieves a mean average precision of approximately 90% on the top-100 video shots, and face recognition clearly outperforms the baseline on the public Movie Trailers Face Dataset.

ASJC Scopus Sachgebiete

Informatik (insg.)
Software
Ingenieurwesen (insg.)
Medientechnik
Informatik (insg.)
Hardware und Architektur
Informatik (insg.)
Computernetzwerke und -kommunikation

Zitieren

Deep learning for content-based video retrieval in film and television production. / Mühling, Markus; Korfhage, Nikolaus; Otto, Christian et al.
in: Multimedia tools and applications, Jahrgang 76, Nr. 21, 05.07.2017, S. 22169-22194.

Publikation: Beitrag in Fachzeitschrift › Artikel › Forschung › Peer-Review

Mühling, M, Korfhage, N, Otto, C, Springstein, M, Langelage, T, Veith, U, Ewerth, R, Freisleben, B & Müller-Budack, E 2017, 'Deep learning for content-based video retrieval in film and television production', Multimedia tools and applications, Jg. 76, Nr. 21, S. 22169-22194. https://doi.org/10.1007/s11042-017-4962-9

Mühling, M., Korfhage, N., Otto, C., Springstein, M., Langelage, T., Veith, U., Ewerth, R., Freisleben, B., & Müller-Budack, E. (2017). Deep learning for content-based video retrieval in film and television production. Multimedia tools and applications, 76(21), 22169-22194. https://doi.org/10.1007/s11042-017-4962-9

Mühling M, Korfhage N, Otto C, Springstein M, Langelage T, Veith U et al. Deep learning for content-based video retrieval in film and television production. Multimedia tools and applications. 2017 Jul 5;76(21):22169-22194. doi: 10.1007/s11042-017-4962-9

Mühling, Markus ; Korfhage, Nikolaus ; Otto, Christian et al. / Deep learning for content-based video retrieval in film and television production. in: Multimedia tools and applications. 2017 ; Jahrgang 76, Nr. 21. S. 22169-22194.

Download

@article{af220cec09dd4780abb49993d89b4afb,

title = "Deep learning for content-based video retrieval in film and television production",

abstract = "While digitization has changed the workflow of professional media production, the content-based labeling of image sequences and video footage, necessary for all subsequent stages of film and television production, archival or marketing is typically still performed manually and thus quite time-consuming. In this paper, we present deep learning approaches to support professional media production. In particular, novel algorithms for visual concept detection, similarity search, face detection, face recognition and face clustering are combined in a multimedia tool for effective video inspection and retrieval. The analysis algorithms for concept detection and similarity search are combined in a multi-task learning approach to share network weights, saving almost half of the computation time. Furthermore, a new visual concept lexicon tailored to fast video retrieval for media production and novel visualization components are introduced. Experimental results show the quality of the proposed approaches. For example, concept detection achieves a mean average precision of approximately 90% on the top-100 video shots, and face recognition clearly outperforms the baseline on the public Movie Trailers Face Dataset.",

keywords = "Deep learning, Face recognition, Image and video analysis, Media production, Similarity search, Visual concept detection",

author = "Markus M{\"u}hling and Nikolaus Korfhage and Christian Otto and Matthias Springstein and Thomas Langelage and Uli Veith and Ralph Ewerth and Bernd Freisleben and Eric M{\"u}ller-Budack",

note = "Funding information: This work is financially supported by the German Federal Ministry for Economic Affairs and Energy (BMWi) in the ZIM Programme.",

year = "2017",

month = jul,

day = "5",

doi = "10.1007/s11042-017-4962-9",

language = "English",

volume = "76",

pages = "22169--22194",

journal = "Multimedia tools and applications",

issn = "1380-7501",

publisher = "Springer Netherlands",

number = "21",

}

Download

TY - JOUR

T1 - Deep learning for content-based video retrieval in film and television production

AU - Mühling, Markus

AU - Korfhage, Nikolaus

AU - Otto, Christian

AU - Springstein, Matthias

AU - Langelage, Thomas

AU - Veith, Uli

AU - Ewerth, Ralph

AU - Freisleben, Bernd

AU - Müller-Budack, Eric

N1 - Funding information: This work is financially supported by the German Federal Ministry for Economic Affairs and Energy (BMWi) in the ZIM Programme.

PY - 2017/7/5

Y1 - 2017/7/5

N2 - While digitization has changed the workflow of professional media production, the content-based labeling of image sequences and video footage, necessary for all subsequent stages of film and television production, archival or marketing is typically still performed manually and thus quite time-consuming. In this paper, we present deep learning approaches to support professional media production. In particular, novel algorithms for visual concept detection, similarity search, face detection, face recognition and face clustering are combined in a multimedia tool for effective video inspection and retrieval. The analysis algorithms for concept detection and similarity search are combined in a multi-task learning approach to share network weights, saving almost half of the computation time. Furthermore, a new visual concept lexicon tailored to fast video retrieval for media production and novel visualization components are introduced. Experimental results show the quality of the proposed approaches. For example, concept detection achieves a mean average precision of approximately 90% on the top-100 video shots, and face recognition clearly outperforms the baseline on the public Movie Trailers Face Dataset.

AB - While digitization has changed the workflow of professional media production, the content-based labeling of image sequences and video footage, necessary for all subsequent stages of film and television production, archival or marketing is typically still performed manually and thus quite time-consuming. In this paper, we present deep learning approaches to support professional media production. In particular, novel algorithms for visual concept detection, similarity search, face detection, face recognition and face clustering are combined in a multimedia tool for effective video inspection and retrieval. The analysis algorithms for concept detection and similarity search are combined in a multi-task learning approach to share network weights, saving almost half of the computation time. Furthermore, a new visual concept lexicon tailored to fast video retrieval for media production and novel visualization components are introduced. Experimental results show the quality of the proposed approaches. For example, concept detection achieves a mean average precision of approximately 90% on the top-100 video shots, and face recognition clearly outperforms the baseline on the public Movie Trailers Face Dataset.

KW - Deep learning

KW - Face recognition

KW - Image and video analysis

KW - Media production

KW - Similarity search

KW - Visual concept detection

UR - http://www.scopus.com/inward/record.url?scp=85021790014&partnerID=8YFLogxK

U2 - 10.1007/s11042-017-4962-9

DO - 10.1007/s11042-017-4962-9

M3 - Article

AN - SCOPUS:85021790014

VL - 76

SP - 22169

EP - 22194

JO - Multimedia tools and applications

JF - Multimedia tools and applications

SN - 1380-7501

IS - 21

ER -

Research@Leibniz University

Deep learning for content-based video retrieval in film and television production

Autorschaft

Organisationseinheiten

Externe Organisationen

Details

Abstract

ASJC Scopus Sachgebiete

Zitieren