Finding person relations in image data of news collections in the internet archive

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

  • Eric Müller-Budack
  • Kader Pustu-Iren
  • Sebastian Diering
  • Ralph Ewerth

Research Organisations

External Research Organisations

  • German National Library of Science and Technology (TIB)
View graph of relations

Details

Original languageEnglish
Title of host publicationDigital Libraries for Open Knowledge - 22nd International Conference on Theory and Practice of Digital Libraries, TPDL 2018, Proceedings
EditorsEva Mendez, Cristina Ribeiro, Gabriel David, João Correia Lopes, Fabio Crestani
PublisherSpringer Verlag
Pages229-240
Number of pages12
ISBN (print)9783030000653
Publication statusPublished - 2018
Event22nd International Conference on Theory and Practice of Digital Libraries, TPDL 2018 - Porto, Portugal
Duration: 10 Sept 201813 Sept 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11057 LNCS
ISSN (Print)0302-9743
ISSN (electronic)1611-3349

Abstract

The amount of multimedia content in the World Wide Web is rapidly growing and contains valuable information for many applications in different domains. The Internet Archive initiative has gathered billions of time-versioned web pages since the mid-nineties. However, the huge amount of data is rarely labeled with appropriate metadata and automatic approaches are required to enable semantic search. Normally, the textual content of the Internet Archive is used to extract entities and their possible relations across domains such as politics and entertainment, whereas image and video content is usually disregarded. In this paper, we introduce a system for person recognition in image content of web news stored in the Internet Archive. Thus, the system complements entity recognition in text and allows researchers and analysts to track media coverage and relations of persons more precisely. Based on a deep learning face recognition approach, we suggest a system that detects persons of interest and gathers sample material, which is subsequently used to identify them in the image data of the Internet Archive. We evaluate the performance of the face recognition system on an appropriate standard benchmark dataset and demonstrate the feasibility of the approach with two use cases.

Keywords

    Big data application, Deep learning, Face recognition, Internet Archive

ASJC Scopus subject areas

Cite this

Finding person relations in image data of news collections in the internet archive. / Müller-Budack, Eric; Pustu-Iren, Kader; Diering, Sebastian et al.
Digital Libraries for Open Knowledge - 22nd International Conference on Theory and Practice of Digital Libraries, TPDL 2018, Proceedings. ed. / Eva Mendez; Cristina Ribeiro; Gabriel David; João Correia Lopes; Fabio Crestani. Springer Verlag, 2018. p. 229-240 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11057 LNCS).

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Müller-Budack, E, Pustu-Iren, K, Diering, S & Ewerth, R 2018, Finding person relations in image data of news collections in the internet archive. in E Mendez, C Ribeiro, G David, JC Lopes & F Crestani (eds), Digital Libraries for Open Knowledge - 22nd International Conference on Theory and Practice of Digital Libraries, TPDL 2018, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11057 LNCS, Springer Verlag, pp. 229-240, 22nd International Conference on Theory and Practice of Digital Libraries, TPDL 2018, Porto, Portugal, 10 Sept 2018. https://doi.org/10.1007/978-3-030-00066-0_20
Müller-Budack, E., Pustu-Iren, K., Diering, S., & Ewerth, R. (2018). Finding person relations in image data of news collections in the internet archive. In E. Mendez, C. Ribeiro, G. David, J. C. Lopes, & F. Crestani (Eds.), Digital Libraries for Open Knowledge - 22nd International Conference on Theory and Practice of Digital Libraries, TPDL 2018, Proceedings (pp. 229-240). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11057 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-030-00066-0_20
Müller-Budack E, Pustu-Iren K, Diering S, Ewerth R. Finding person relations in image data of news collections in the internet archive. In Mendez E, Ribeiro C, David G, Lopes JC, Crestani F, editors, Digital Libraries for Open Knowledge - 22nd International Conference on Theory and Practice of Digital Libraries, TPDL 2018, Proceedings. Springer Verlag. 2018. p. 229-240. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-030-00066-0_20
Müller-Budack, Eric ; Pustu-Iren, Kader ; Diering, Sebastian et al. / Finding person relations in image data of news collections in the internet archive. Digital Libraries for Open Knowledge - 22nd International Conference on Theory and Practice of Digital Libraries, TPDL 2018, Proceedings. editor / Eva Mendez ; Cristina Ribeiro ; Gabriel David ; João Correia Lopes ; Fabio Crestani. Springer Verlag, 2018. pp. 229-240 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Download
@inproceedings{449cc597250c47a19751413c7de4e439,
title = "Finding person relations in image data of news collections in the internet archive",
abstract = "The amount of multimedia content in the World Wide Web is rapidly growing and contains valuable information for many applications in different domains. The Internet Archive initiative has gathered billions of time-versioned web pages since the mid-nineties. However, the huge amount of data is rarely labeled with appropriate metadata and automatic approaches are required to enable semantic search. Normally, the textual content of the Internet Archive is used to extract entities and their possible relations across domains such as politics and entertainment, whereas image and video content is usually disregarded. In this paper, we introduce a system for person recognition in image content of web news stored in the Internet Archive. Thus, the system complements entity recognition in text and allows researchers and analysts to track media coverage and relations of persons more precisely. Based on a deep learning face recognition approach, we suggest a system that detects persons of interest and gathers sample material, which is subsequently used to identify them in the image data of the Internet Archive. We evaluate the performance of the face recognition system on an appropriate standard benchmark dataset and demonstrate the feasibility of the approach with two use cases.",
keywords = "Big data application, Deep learning, Face recognition, Internet Archive",
author = "Eric M{\"u}ller-Budack and Kader Pustu-Iren and Sebastian Diering and Ralph Ewerth",
note = "Funding information: Acknowledgement. This work is financially supported by the German Research Foundation (DFG: Deutsche Forschungsgemeinschaft, project number: EW 134/4-1). The work was partially funded by the European Commission for the ERC Advanced Grant ALEXANDRIA (No. 339233, Wolfgang Nejdl).; 22nd International Conference on Theory and Practice of Digital Libraries, TPDL 2018 ; Conference date: 10-09-2018 Through 13-09-2018",
year = "2018",
doi = "10.1007/978-3-030-00066-0_20",
language = "English",
isbn = "9783030000653",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "229--240",
editor = "Eva Mendez and Cristina Ribeiro and Gabriel David and Lopes, {Jo{\~a}o Correia} and Fabio Crestani",
booktitle = "Digital Libraries for Open Knowledge - 22nd International Conference on Theory and Practice of Digital Libraries, TPDL 2018, Proceedings",
address = "Germany",

}

Download

TY - GEN

T1 - Finding person relations in image data of news collections in the internet archive

AU - Müller-Budack, Eric

AU - Pustu-Iren, Kader

AU - Diering, Sebastian

AU - Ewerth, Ralph

N1 - Funding information: Acknowledgement. This work is financially supported by the German Research Foundation (DFG: Deutsche Forschungsgemeinschaft, project number: EW 134/4-1). The work was partially funded by the European Commission for the ERC Advanced Grant ALEXANDRIA (No. 339233, Wolfgang Nejdl).

PY - 2018

Y1 - 2018

N2 - The amount of multimedia content in the World Wide Web is rapidly growing and contains valuable information for many applications in different domains. The Internet Archive initiative has gathered billions of time-versioned web pages since the mid-nineties. However, the huge amount of data is rarely labeled with appropriate metadata and automatic approaches are required to enable semantic search. Normally, the textual content of the Internet Archive is used to extract entities and their possible relations across domains such as politics and entertainment, whereas image and video content is usually disregarded. In this paper, we introduce a system for person recognition in image content of web news stored in the Internet Archive. Thus, the system complements entity recognition in text and allows researchers and analysts to track media coverage and relations of persons more precisely. Based on a deep learning face recognition approach, we suggest a system that detects persons of interest and gathers sample material, which is subsequently used to identify them in the image data of the Internet Archive. We evaluate the performance of the face recognition system on an appropriate standard benchmark dataset and demonstrate the feasibility of the approach with two use cases.

AB - The amount of multimedia content in the World Wide Web is rapidly growing and contains valuable information for many applications in different domains. The Internet Archive initiative has gathered billions of time-versioned web pages since the mid-nineties. However, the huge amount of data is rarely labeled with appropriate metadata and automatic approaches are required to enable semantic search. Normally, the textual content of the Internet Archive is used to extract entities and their possible relations across domains such as politics and entertainment, whereas image and video content is usually disregarded. In this paper, we introduce a system for person recognition in image content of web news stored in the Internet Archive. Thus, the system complements entity recognition in text and allows researchers and analysts to track media coverage and relations of persons more precisely. Based on a deep learning face recognition approach, we suggest a system that detects persons of interest and gathers sample material, which is subsequently used to identify them in the image data of the Internet Archive. We evaluate the performance of the face recognition system on an appropriate standard benchmark dataset and demonstrate the feasibility of the approach with two use cases.

KW - Big data application

KW - Deep learning

KW - Face recognition

KW - Internet Archive

UR - http://www.scopus.com/inward/record.url?scp=85053857463&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-00066-0_20

DO - 10.1007/978-3-030-00066-0_20

M3 - Conference contribution

AN - SCOPUS:85053857463

SN - 9783030000653

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 229

EP - 240

BT - Digital Libraries for Open Knowledge - 22nd International Conference on Theory and Practice of Digital Libraries, TPDL 2018, Proceedings

A2 - Mendez, Eva

A2 - Ribeiro, Cristina

A2 - David, Gabriel

A2 - Lopes, João Correia

A2 - Crestani, Fabio

PB - Springer Verlag

T2 - 22nd International Conference on Theory and Practice of Digital Libraries, TPDL 2018

Y2 - 10 September 2018 through 13 September 2018

ER -