Deep learning for content-based video retrieval in film and television production

Research output: Contribution to journalArticleResearchpeer review

Authors

  • Markus Mühling
  • Nikolaus Korfhage
  • Christian Otto
  • Matthias Springstein
  • Thomas Langelage
  • Uli Veith
  • Ralph Ewerth
  • Bernd Freisleben
  • Eric Müller-Budack

Research Organisations

External Research Organisations

  • Philipps-Universität Marburg
  • German National Library of Science and Technology (TIB)
  • taglicht media Film- & Fernsehproduktion GmbH
View graph of relations

Details

Original languageEnglish
Pages (from-to)22169-22194
Number of pages26
JournalMultimedia tools and applications
Volume76
Issue number21
Publication statusPublished - 5 Jul 2017

Abstract

While digitization has changed the workflow of professional media production, the content-based labeling of image sequences and video footage, necessary for all subsequent stages of film and television production, archival or marketing is typically still performed manually and thus quite time-consuming. In this paper, we present deep learning approaches to support professional media production. In particular, novel algorithms for visual concept detection, similarity search, face detection, face recognition and face clustering are combined in a multimedia tool for effective video inspection and retrieval. The analysis algorithms for concept detection and similarity search are combined in a multi-task learning approach to share network weights, saving almost half of the computation time. Furthermore, a new visual concept lexicon tailored to fast video retrieval for media production and novel visualization components are introduced. Experimental results show the quality of the proposed approaches. For example, concept detection achieves a mean average precision of approximately 90% on the top-100 video shots, and face recognition clearly outperforms the baseline on the public Movie Trailers Face Dataset.

Keywords

    Deep learning, Face recognition, Image and video analysis, Media production, Similarity search, Visual concept detection

ASJC Scopus subject areas

Cite this

Deep learning for content-based video retrieval in film and television production. / Mühling, Markus; Korfhage, Nikolaus; Otto, Christian et al.
In: Multimedia tools and applications, Vol. 76, No. 21, 05.07.2017, p. 22169-22194.

Research output: Contribution to journalArticleResearchpeer review

Mühling, M, Korfhage, N, Otto, C, Springstein, M, Langelage, T, Veith, U, Ewerth, R, Freisleben, B & Müller-Budack, E 2017, 'Deep learning for content-based video retrieval in film and television production', Multimedia tools and applications, vol. 76, no. 21, pp. 22169-22194. https://doi.org/10.1007/s11042-017-4962-9
Mühling, M., Korfhage, N., Otto, C., Springstein, M., Langelage, T., Veith, U., Ewerth, R., Freisleben, B., & Müller-Budack, E. (2017). Deep learning for content-based video retrieval in film and television production. Multimedia tools and applications, 76(21), 22169-22194. https://doi.org/10.1007/s11042-017-4962-9
Mühling M, Korfhage N, Otto C, Springstein M, Langelage T, Veith U et al. Deep learning for content-based video retrieval in film and television production. Multimedia tools and applications. 2017 Jul 5;76(21):22169-22194. doi: 10.1007/s11042-017-4962-9
Mühling, Markus ; Korfhage, Nikolaus ; Otto, Christian et al. / Deep learning for content-based video retrieval in film and television production. In: Multimedia tools and applications. 2017 ; Vol. 76, No. 21. pp. 22169-22194.
Download
@article{af220cec09dd4780abb49993d89b4afb,
title = "Deep learning for content-based video retrieval in film and television production",
abstract = "While digitization has changed the workflow of professional media production, the content-based labeling of image sequences and video footage, necessary for all subsequent stages of film and television production, archival or marketing is typically still performed manually and thus quite time-consuming. In this paper, we present deep learning approaches to support professional media production. In particular, novel algorithms for visual concept detection, similarity search, face detection, face recognition and face clustering are combined in a multimedia tool for effective video inspection and retrieval. The analysis algorithms for concept detection and similarity search are combined in a multi-task learning approach to share network weights, saving almost half of the computation time. Furthermore, a new visual concept lexicon tailored to fast video retrieval for media production and novel visualization components are introduced. Experimental results show the quality of the proposed approaches. For example, concept detection achieves a mean average precision of approximately 90% on the top-100 video shots, and face recognition clearly outperforms the baseline on the public Movie Trailers Face Dataset.",
keywords = "Deep learning, Face recognition, Image and video analysis, Media production, Similarity search, Visual concept detection",
author = "Markus M{\"u}hling and Nikolaus Korfhage and Christian Otto and Matthias Springstein and Thomas Langelage and Uli Veith and Ralph Ewerth and Bernd Freisleben and Eric M{\"u}ller-Budack",
note = "Funding information: This work is financially supported by the German Federal Ministry for Economic Affairs and Energy (BMWi) in the ZIM Programme.",
year = "2017",
month = jul,
day = "5",
doi = "10.1007/s11042-017-4962-9",
language = "English",
volume = "76",
pages = "22169--22194",
journal = "Multimedia tools and applications",
issn = "1380-7501",
publisher = "Springer Netherlands",
number = "21",

}

Download

TY - JOUR

T1 - Deep learning for content-based video retrieval in film and television production

AU - Mühling, Markus

AU - Korfhage, Nikolaus

AU - Otto, Christian

AU - Springstein, Matthias

AU - Langelage, Thomas

AU - Veith, Uli

AU - Ewerth, Ralph

AU - Freisleben, Bernd

AU - Müller-Budack, Eric

N1 - Funding information: This work is financially supported by the German Federal Ministry for Economic Affairs and Energy (BMWi) in the ZIM Programme.

PY - 2017/7/5

Y1 - 2017/7/5

N2 - While digitization has changed the workflow of professional media production, the content-based labeling of image sequences and video footage, necessary for all subsequent stages of film and television production, archival or marketing is typically still performed manually and thus quite time-consuming. In this paper, we present deep learning approaches to support professional media production. In particular, novel algorithms for visual concept detection, similarity search, face detection, face recognition and face clustering are combined in a multimedia tool for effective video inspection and retrieval. The analysis algorithms for concept detection and similarity search are combined in a multi-task learning approach to share network weights, saving almost half of the computation time. Furthermore, a new visual concept lexicon tailored to fast video retrieval for media production and novel visualization components are introduced. Experimental results show the quality of the proposed approaches. For example, concept detection achieves a mean average precision of approximately 90% on the top-100 video shots, and face recognition clearly outperforms the baseline on the public Movie Trailers Face Dataset.

AB - While digitization has changed the workflow of professional media production, the content-based labeling of image sequences and video footage, necessary for all subsequent stages of film and television production, archival or marketing is typically still performed manually and thus quite time-consuming. In this paper, we present deep learning approaches to support professional media production. In particular, novel algorithms for visual concept detection, similarity search, face detection, face recognition and face clustering are combined in a multimedia tool for effective video inspection and retrieval. The analysis algorithms for concept detection and similarity search are combined in a multi-task learning approach to share network weights, saving almost half of the computation time. Furthermore, a new visual concept lexicon tailored to fast video retrieval for media production and novel visualization components are introduced. Experimental results show the quality of the proposed approaches. For example, concept detection achieves a mean average precision of approximately 90% on the top-100 video shots, and face recognition clearly outperforms the baseline on the public Movie Trailers Face Dataset.

KW - Deep learning

KW - Face recognition

KW - Image and video analysis

KW - Media production

KW - Similarity search

KW - Visual concept detection

UR - http://www.scopus.com/inward/record.url?scp=85021790014&partnerID=8YFLogxK

U2 - 10.1007/s11042-017-4962-9

DO - 10.1007/s11042-017-4962-9

M3 - Article

AN - SCOPUS:85021790014

VL - 76

SP - 22169

EP - 22194

JO - Multimedia tools and applications

JF - Multimedia tools and applications

SN - 1380-7501

IS - 21

ER -