Details
Original language | English |
---|---|
Title of host publication | Digital Libraries for Open Knowledge - 22nd International Conference on Theory and Practice of Digital Libraries, TPDL 2018, Proceedings |
Editors | Eva Mendez, Cristina Ribeiro, Gabriel David, João Correia Lopes, Fabio Crestani |
Publisher | Springer Verlag |
Pages | 229-240 |
Number of pages | 12 |
ISBN (print) | 9783030000653 |
Publication status | Published - 2018 |
Event | 22nd International Conference on Theory and Practice of Digital Libraries, TPDL 2018 - Porto, Portugal Duration: 10 Sept 2018 → 13 Sept 2018 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 11057 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (electronic) | 1611-3349 |
Abstract
The amount of multimedia content in the World Wide Web is rapidly growing and contains valuable information for many applications in different domains. The Internet Archive initiative has gathered billions of time-versioned web pages since the mid-nineties. However, the huge amount of data is rarely labeled with appropriate metadata and automatic approaches are required to enable semantic search. Normally, the textual content of the Internet Archive is used to extract entities and their possible relations across domains such as politics and entertainment, whereas image and video content is usually disregarded. In this paper, we introduce a system for person recognition in image content of web news stored in the Internet Archive. Thus, the system complements entity recognition in text and allows researchers and analysts to track media coverage and relations of persons more precisely. Based on a deep learning face recognition approach, we suggest a system that detects persons of interest and gathers sample material, which is subsequently used to identify them in the image data of the Internet Archive. We evaluate the performance of the face recognition system on an appropriate standard benchmark dataset and demonstrate the feasibility of the approach with two use cases.
Keywords
- Big data application, Deep learning, Face recognition, Internet Archive
ASJC Scopus subject areas
- Mathematics(all)
- Theoretical Computer Science
- Computer Science(all)
- General Computer Science
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
Digital Libraries for Open Knowledge - 22nd International Conference on Theory and Practice of Digital Libraries, TPDL 2018, Proceedings. ed. / Eva Mendez; Cristina Ribeiro; Gabriel David; João Correia Lopes; Fabio Crestani. Springer Verlag, 2018. p. 229-240 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11057 LNCS).
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - Finding person relations in image data of news collections in the internet archive
AU - Müller-Budack, Eric
AU - Pustu-Iren, Kader
AU - Diering, Sebastian
AU - Ewerth, Ralph
N1 - Funding information: Acknowledgement. This work is financially supported by the German Research Foundation (DFG: Deutsche Forschungsgemeinschaft, project number: EW 134/4-1). The work was partially funded by the European Commission for the ERC Advanced Grant ALEXANDRIA (No. 339233, Wolfgang Nejdl).
PY - 2018
Y1 - 2018
N2 - The amount of multimedia content in the World Wide Web is rapidly growing and contains valuable information for many applications in different domains. The Internet Archive initiative has gathered billions of time-versioned web pages since the mid-nineties. However, the huge amount of data is rarely labeled with appropriate metadata and automatic approaches are required to enable semantic search. Normally, the textual content of the Internet Archive is used to extract entities and their possible relations across domains such as politics and entertainment, whereas image and video content is usually disregarded. In this paper, we introduce a system for person recognition in image content of web news stored in the Internet Archive. Thus, the system complements entity recognition in text and allows researchers and analysts to track media coverage and relations of persons more precisely. Based on a deep learning face recognition approach, we suggest a system that detects persons of interest and gathers sample material, which is subsequently used to identify them in the image data of the Internet Archive. We evaluate the performance of the face recognition system on an appropriate standard benchmark dataset and demonstrate the feasibility of the approach with two use cases.
AB - The amount of multimedia content in the World Wide Web is rapidly growing and contains valuable information for many applications in different domains. The Internet Archive initiative has gathered billions of time-versioned web pages since the mid-nineties. However, the huge amount of data is rarely labeled with appropriate metadata and automatic approaches are required to enable semantic search. Normally, the textual content of the Internet Archive is used to extract entities and their possible relations across domains such as politics and entertainment, whereas image and video content is usually disregarded. In this paper, we introduce a system for person recognition in image content of web news stored in the Internet Archive. Thus, the system complements entity recognition in text and allows researchers and analysts to track media coverage and relations of persons more precisely. Based on a deep learning face recognition approach, we suggest a system that detects persons of interest and gathers sample material, which is subsequently used to identify them in the image data of the Internet Archive. We evaluate the performance of the face recognition system on an appropriate standard benchmark dataset and demonstrate the feasibility of the approach with two use cases.
KW - Big data application
KW - Deep learning
KW - Face recognition
KW - Internet Archive
UR - http://www.scopus.com/inward/record.url?scp=85053857463&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-00066-0_20
DO - 10.1007/978-3-030-00066-0_20
M3 - Conference contribution
AN - SCOPUS:85053857463
SN - 9783030000653
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 229
EP - 240
BT - Digital Libraries for Open Knowledge - 22nd International Conference on Theory and Practice of Digital Libraries, TPDL 2018, Proceedings
A2 - Mendez, Eva
A2 - Ribeiro, Cristina
A2 - David, Gabriel
A2 - Lopes, João Correia
A2 - Crestani, Fabio
PB - Springer Verlag
T2 - 22nd International Conference on Theory and Practice of Digital Libraries, TPDL 2018
Y2 - 10 September 2018 through 13 September 2018
ER -