Image Analytics in Web Archives

Eric Müller-Budack; Kader Pustu-Iren; Sebastian Diering; Matthias Springstein; Ralph Ewerth

doi:10.1007/978-3-030-63291-5_11

Details

Originalsprache	Englisch
Titel des Sammelwerks	The Past Web
Untertitel	Exploring Web Archives
Erscheinungsort	Cham
Herausgeber (Verlag)	Springer International Publishing AG
Seiten	141-151
Seitenumfang	11
ISBN (elektronisch)	9783030632915
ISBN (Print)	9783030632908
Publikationsstatus	Veröffentlicht - 1 Juli 2021

Abstract

The multimedia content published on the World Wide Web is constantly growing and contains valuable information in various domains. The Internet Archive initiative has gathered billions of time-versioned web pages since the mid-nineties, but unfortunately, they are rarely provided with appropriate metadata. This lack of structured data limits the exploration of the archives, and automated solutions are required to enable semantic search. While many approaches exploit the textual content of news in the Internet Archive to detect named entities and their relations, visual information is generally disregarded. In this chapter, we present an approach that leverages deep learning techniques for the identification of public personalities in the images of news articles stored in the Internet Archive. In addition, we elaborate on how this approach can be extended to enable detection of other entity types such as locations or events. The approach complements named entity recognition and linking tools for text and allows researchers and analysts to track the media coverage and relations of persons more precisely. We have analysed more than one million images from news articles in the Internet Archive and demonstrated the feasibility of the approach with two use cases in different domains: politics and entertainment.

ASJC Scopus Sachgebiete

Informatik (insg.)
Allgemeine Computerwissenschaft
Geisteswissenschaftliche Fächer (insg.)
Allgemeine Kunst und Geisteswissenschaften
Sozialwissenschaften (insg.)
Allgemeine Sozialwissenschaften

Zitieren

Image Analytics in Web Archives. / Müller-Budack, Eric; Pustu-Iren, Kader; Diering, Sebastian et al.
The Past Web: Exploring Web Archives. Cham: Springer International Publishing AG, 2021. S. 141-151.

Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Beitrag in Buch/Sammelwerk › Forschung › Peer-Review

Müller-Budack, E, Pustu-Iren, K, Diering, S, Springstein, M & Ewerth, R 2021, Image Analytics in Web Archives. in The Past Web: Exploring Web Archives. Springer International Publishing AG, Cham, S. 141-151. https://doi.org/10.1007/978-3-030-63291-5_11

Müller-Budack, E., Pustu-Iren, K., Diering, S., Springstein, M., & Ewerth, R. (2021). Image Analytics in Web Archives. In The Past Web: Exploring Web Archives (S. 141-151). Springer International Publishing AG. https://doi.org/10.1007/978-3-030-63291-5_11

Müller-Budack E, Pustu-Iren K, Diering S, Springstein M, Ewerth R. Image Analytics in Web Archives. in The Past Web: Exploring Web Archives. Cham: Springer International Publishing AG. 2021. S. 141-151 Epub 2021 Jun 30. doi: 10.1007/978-3-030-63291-5_11

Müller-Budack, Eric ; Pustu-Iren, Kader ; Diering, Sebastian et al. / Image Analytics in Web Archives. The Past Web: Exploring Web Archives. Cham : Springer International Publishing AG, 2021. S. 141-151

Download

@inbook{ce988b142ce746039920148034f3cf21,

title = "Image Analytics in Web Archives",

abstract = "The multimedia content published on the World Wide Web is constantly growing and contains valuable information in various domains. The Internet Archive initiative has gathered billions of time-versioned web pages since the mid-nineties, but unfortunately, they are rarely provided with appropriate metadata. This lack of structured data limits the exploration of the archives, and automated solutions are required to enable semantic search. While many approaches exploit the textual content of news in the Internet Archive to detect named entities and their relations, visual information is generally disregarded. In this chapter, we present an approach that leverages deep learning techniques for the identification of public personalities in the images of news articles stored in the Internet Archive. In addition, we elaborate on how this approach can be extended to enable detection of other entity types such as locations or events. The approach complements named entity recognition and linking tools for text and allows researchers and analysts to track the media coverage and relations of persons more precisely. We have analysed more than one million images from news articles in the Internet Archive and demonstrated the feasibility of the approach with two use cases in different domains: politics and entertainment.",

author = "Eric M{\"u}ller-Budack and Kader Pustu-Iren and Sebastian Diering and Matthias Springstein and Ralph Ewerth",

note = "This work is financially supported by the German Research Foundation (DFG: Deutsche Forschungsgemeinschaft, project number: 388420599). This work was partially funded by the European Commission for the ERC Advanced Grant ALEXANDRIA (No. 339233, Wolfgang Nejdl).",

year = "2021",

month = jul,

day = "1",

doi = "10.1007/978-3-030-63291-5_11",

language = "English",

isbn = "9783030632908",

pages = "141--151",

booktitle = "The Past Web",

publisher = "Springer International Publishing AG",

address = "Switzerland",

}

Download

TY - CHAP

T1 - Image Analytics in Web Archives

AU - Müller-Budack, Eric

AU - Pustu-Iren, Kader

AU - Diering, Sebastian

AU - Springstein, Matthias

AU - Ewerth, Ralph

N1 - This work is financially supported by the German Research Foundation (DFG: Deutsche Forschungsgemeinschaft, project number: 388420599). This work was partially funded by the European Commission for the ERC Advanced Grant ALEXANDRIA (No. 339233, Wolfgang Nejdl).

PY - 2021/7/1

Y1 - 2021/7/1

N2 - The multimedia content published on the World Wide Web is constantly growing and contains valuable information in various domains. The Internet Archive initiative has gathered billions of time-versioned web pages since the mid-nineties, but unfortunately, they are rarely provided with appropriate metadata. This lack of structured data limits the exploration of the archives, and automated solutions are required to enable semantic search. While many approaches exploit the textual content of news in the Internet Archive to detect named entities and their relations, visual information is generally disregarded. In this chapter, we present an approach that leverages deep learning techniques for the identification of public personalities in the images of news articles stored in the Internet Archive. In addition, we elaborate on how this approach can be extended to enable detection of other entity types such as locations or events. The approach complements named entity recognition and linking tools for text and allows researchers and analysts to track the media coverage and relations of persons more precisely. We have analysed more than one million images from news articles in the Internet Archive and demonstrated the feasibility of the approach with two use cases in different domains: politics and entertainment.

AB - The multimedia content published on the World Wide Web is constantly growing and contains valuable information in various domains. The Internet Archive initiative has gathered billions of time-versioned web pages since the mid-nineties, but unfortunately, they are rarely provided with appropriate metadata. This lack of structured data limits the exploration of the archives, and automated solutions are required to enable semantic search. While many approaches exploit the textual content of news in the Internet Archive to detect named entities and their relations, visual information is generally disregarded. In this chapter, we present an approach that leverages deep learning techniques for the identification of public personalities in the images of news articles stored in the Internet Archive. In addition, we elaborate on how this approach can be extended to enable detection of other entity types such as locations or events. The approach complements named entity recognition and linking tools for text and allows researchers and analysts to track the media coverage and relations of persons more precisely. We have analysed more than one million images from news articles in the Internet Archive and demonstrated the feasibility of the approach with two use cases in different domains: politics and entertainment.

UR - http://www.scopus.com/inward/record.url?scp=85150072196&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-63291-5_11

DO - 10.1007/978-3-030-63291-5_11

M3 - Contribution to book/anthology

AN - SCOPUS:85150072196

SN - 9783030632908

SP - 141

EP - 151

BT - The Past Web

PB - Springer International Publishing AG

CY - Cham

ER -

Research@Leibniz University

Image Analytics in Web Archives

Autoren

Organisationseinheiten

Externe Organisationen

Details

Abstract

ASJC Scopus Sachgebiete

Zitieren