Image Analytics in Web Archives

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandBeitrag in Buch/SammelwerkForschungPeer-Review

Autoren

  • Eric Müller-Budack
  • Kader Pustu-Iren
  • Sebastian Diering
  • Matthias Springstein
  • Ralph Ewerth

Organisationseinheiten

Externe Organisationen

  • Technische Informationsbibliothek (TIB) Leibniz-Informationszentrum Technik und Naturwissenschaften und Universitätsbibliothek
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Titel des SammelwerksThe Past Web
UntertitelExploring Web Archives
ErscheinungsortCham
Herausgeber (Verlag)Springer International Publishing AG
Seiten141-151
Seitenumfang11
ISBN (elektronisch)9783030632915
ISBN (Print)9783030632908
PublikationsstatusVeröffentlicht - 1 Juli 2021

Abstract

The multimedia content published on the World Wide Web is constantly growing and contains valuable information in various domains. The Internet Archive initiative has gathered billions of time-versioned web pages since the mid-nineties, but unfortunately, they are rarely provided with appropriate metadata. This lack of structured data limits the exploration of the archives, and automated solutions are required to enable semantic search. While many approaches exploit the textual content of news in the Internet Archive to detect named entities and their relations, visual information is generally disregarded. In this chapter, we present an approach that leverages deep learning techniques for the identification of public personalities in the images of news articles stored in the Internet Archive. In addition, we elaborate on how this approach can be extended to enable detection of other entity types such as locations or events. The approach complements named entity recognition and linking tools for text and allows researchers and analysts to track the media coverage and relations of persons more precisely. We have analysed more than one million images from news articles in the Internet Archive and demonstrated the feasibility of the approach with two use cases in different domains: politics and entertainment.

ASJC Scopus Sachgebiete

Zitieren

Image Analytics in Web Archives. / Müller-Budack, Eric; Pustu-Iren, Kader; Diering, Sebastian et al.
The Past Web: Exploring Web Archives. Cham: Springer International Publishing AG, 2021. S. 141-151.

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandBeitrag in Buch/SammelwerkForschungPeer-Review

Müller-Budack, E, Pustu-Iren, K, Diering, S, Springstein, M & Ewerth, R 2021, Image Analytics in Web Archives. in The Past Web: Exploring Web Archives. Springer International Publishing AG, Cham, S. 141-151. https://doi.org/10.1007/978-3-030-63291-5_11
Müller-Budack, E., Pustu-Iren, K., Diering, S., Springstein, M., & Ewerth, R. (2021). Image Analytics in Web Archives. In The Past Web: Exploring Web Archives (S. 141-151). Springer International Publishing AG. https://doi.org/10.1007/978-3-030-63291-5_11
Müller-Budack E, Pustu-Iren K, Diering S, Springstein M, Ewerth R. Image Analytics in Web Archives. in The Past Web: Exploring Web Archives. Cham: Springer International Publishing AG. 2021. S. 141-151 Epub 2021 Jun 30. doi: 10.1007/978-3-030-63291-5_11
Müller-Budack, Eric ; Pustu-Iren, Kader ; Diering, Sebastian et al. / Image Analytics in Web Archives. The Past Web: Exploring Web Archives. Cham : Springer International Publishing AG, 2021. S. 141-151
Download
@inbook{ce988b142ce746039920148034f3cf21,
title = "Image Analytics in Web Archives",
abstract = "The multimedia content published on the World Wide Web is constantly growing and contains valuable information in various domains. The Internet Archive initiative has gathered billions of time-versioned web pages since the mid-nineties, but unfortunately, they are rarely provided with appropriate metadata. This lack of structured data limits the exploration of the archives, and automated solutions are required to enable semantic search. While many approaches exploit the textual content of news in the Internet Archive to detect named entities and their relations, visual information is generally disregarded. In this chapter, we present an approach that leverages deep learning techniques for the identification of public personalities in the images of news articles stored in the Internet Archive. In addition, we elaborate on how this approach can be extended to enable detection of other entity types such as locations or events. The approach complements named entity recognition and linking tools for text and allows researchers and analysts to track the media coverage and relations of persons more precisely. We have analysed more than one million images from news articles in the Internet Archive and demonstrated the feasibility of the approach with two use cases in different domains: politics and entertainment.",
author = "Eric M{\"u}ller-Budack and Kader Pustu-Iren and Sebastian Diering and Matthias Springstein and Ralph Ewerth",
note = "This work is financially supported by the German Research Foundation (DFG: Deutsche Forschungsgemeinschaft, project number: 388420599). This work was partially funded by the European Commission for the ERC Advanced Grant ALEXANDRIA (No. 339233, Wolfgang Nejdl).",
year = "2021",
month = jul,
day = "1",
doi = "10.1007/978-3-030-63291-5_11",
language = "English",
isbn = "9783030632908",
pages = "141--151",
booktitle = "The Past Web",
publisher = "Springer International Publishing AG",
address = "Switzerland",

}

Download

TY - CHAP

T1 - Image Analytics in Web Archives

AU - Müller-Budack, Eric

AU - Pustu-Iren, Kader

AU - Diering, Sebastian

AU - Springstein, Matthias

AU - Ewerth, Ralph

N1 - This work is financially supported by the German Research Foundation (DFG: Deutsche Forschungsgemeinschaft, project number: 388420599). This work was partially funded by the European Commission for the ERC Advanced Grant ALEXANDRIA (No. 339233, Wolfgang Nejdl).

PY - 2021/7/1

Y1 - 2021/7/1

N2 - The multimedia content published on the World Wide Web is constantly growing and contains valuable information in various domains. The Internet Archive initiative has gathered billions of time-versioned web pages since the mid-nineties, but unfortunately, they are rarely provided with appropriate metadata. This lack of structured data limits the exploration of the archives, and automated solutions are required to enable semantic search. While many approaches exploit the textual content of news in the Internet Archive to detect named entities and their relations, visual information is generally disregarded. In this chapter, we present an approach that leverages deep learning techniques for the identification of public personalities in the images of news articles stored in the Internet Archive. In addition, we elaborate on how this approach can be extended to enable detection of other entity types such as locations or events. The approach complements named entity recognition and linking tools for text and allows researchers and analysts to track the media coverage and relations of persons more precisely. We have analysed more than one million images from news articles in the Internet Archive and demonstrated the feasibility of the approach with two use cases in different domains: politics and entertainment.

AB - The multimedia content published on the World Wide Web is constantly growing and contains valuable information in various domains. The Internet Archive initiative has gathered billions of time-versioned web pages since the mid-nineties, but unfortunately, they are rarely provided with appropriate metadata. This lack of structured data limits the exploration of the archives, and automated solutions are required to enable semantic search. While many approaches exploit the textual content of news in the Internet Archive to detect named entities and their relations, visual information is generally disregarded. In this chapter, we present an approach that leverages deep learning techniques for the identification of public personalities in the images of news articles stored in the Internet Archive. In addition, we elaborate on how this approach can be extended to enable detection of other entity types such as locations or events. The approach complements named entity recognition and linking tools for text and allows researchers and analysts to track the media coverage and relations of persons more precisely. We have analysed more than one million images from news articles in the Internet Archive and demonstrated the feasibility of the approach with two use cases in different domains: politics and entertainment.

UR - http://www.scopus.com/inward/record.url?scp=85150072196&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-63291-5_11

DO - 10.1007/978-3-030-63291-5_11

M3 - Contribution to book/anthology

AN - SCOPUS:85150072196

SN - 9783030632908

SP - 141

EP - 151

BT - The Past Web

PB - Springer International Publishing AG

CY - Cham

ER -