Details
Originalsprache | Englisch |
---|---|
Titel des Sammelwerks | The Past Web |
Untertitel | Exploring Web Archives |
Erscheinungsort | Cham |
Herausgeber (Verlag) | Springer International Publishing AG |
Seiten | 141-151 |
Seitenumfang | 11 |
ISBN (elektronisch) | 9783030632915 |
ISBN (Print) | 9783030632908 |
Publikationsstatus | Veröffentlicht - 1 Juli 2021 |
Abstract
The multimedia content published on the World Wide Web is constantly growing and contains valuable information in various domains. The Internet Archive initiative has gathered billions of time-versioned web pages since the mid-nineties, but unfortunately, they are rarely provided with appropriate metadata. This lack of structured data limits the exploration of the archives, and automated solutions are required to enable semantic search. While many approaches exploit the textual content of news in the Internet Archive to detect named entities and their relations, visual information is generally disregarded. In this chapter, we present an approach that leverages deep learning techniques for the identification of public personalities in the images of news articles stored in the Internet Archive. In addition, we elaborate on how this approach can be extended to enable detection of other entity types such as locations or events. The approach complements named entity recognition and linking tools for text and allows researchers and analysts to track the media coverage and relations of persons more precisely. We have analysed more than one million images from news articles in the Internet Archive and demonstrated the feasibility of the approach with two use cases in different domains: politics and entertainment.
ASJC Scopus Sachgebiete
- Informatik (insg.)
- Allgemeine Computerwissenschaft
- Geisteswissenschaftliche Fächer (insg.)
- Allgemeine Kunst und Geisteswissenschaften
- Sozialwissenschaften (insg.)
- Allgemeine Sozialwissenschaften
Zitieren
- Standard
- Harvard
- Apa
- Vancouver
- BibTex
- RIS
The Past Web: Exploring Web Archives. Cham: Springer International Publishing AG, 2021. S. 141-151.
Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Beitrag in Buch/Sammelwerk › Forschung › Peer-Review
}
TY - CHAP
T1 - Image Analytics in Web Archives
AU - Müller-Budack, Eric
AU - Pustu-Iren, Kader
AU - Diering, Sebastian
AU - Springstein, Matthias
AU - Ewerth, Ralph
N1 - This work is financially supported by the German Research Foundation (DFG: Deutsche Forschungsgemeinschaft, project number: 388420599). This work was partially funded by the European Commission for the ERC Advanced Grant ALEXANDRIA (No. 339233, Wolfgang Nejdl).
PY - 2021/7/1
Y1 - 2021/7/1
N2 - The multimedia content published on the World Wide Web is constantly growing and contains valuable information in various domains. The Internet Archive initiative has gathered billions of time-versioned web pages since the mid-nineties, but unfortunately, they are rarely provided with appropriate metadata. This lack of structured data limits the exploration of the archives, and automated solutions are required to enable semantic search. While many approaches exploit the textual content of news in the Internet Archive to detect named entities and their relations, visual information is generally disregarded. In this chapter, we present an approach that leverages deep learning techniques for the identification of public personalities in the images of news articles stored in the Internet Archive. In addition, we elaborate on how this approach can be extended to enable detection of other entity types such as locations or events. The approach complements named entity recognition and linking tools for text and allows researchers and analysts to track the media coverage and relations of persons more precisely. We have analysed more than one million images from news articles in the Internet Archive and demonstrated the feasibility of the approach with two use cases in different domains: politics and entertainment.
AB - The multimedia content published on the World Wide Web is constantly growing and contains valuable information in various domains. The Internet Archive initiative has gathered billions of time-versioned web pages since the mid-nineties, but unfortunately, they are rarely provided with appropriate metadata. This lack of structured data limits the exploration of the archives, and automated solutions are required to enable semantic search. While many approaches exploit the textual content of news in the Internet Archive to detect named entities and their relations, visual information is generally disregarded. In this chapter, we present an approach that leverages deep learning techniques for the identification of public personalities in the images of news articles stored in the Internet Archive. In addition, we elaborate on how this approach can be extended to enable detection of other entity types such as locations or events. The approach complements named entity recognition and linking tools for text and allows researchers and analysts to track the media coverage and relations of persons more precisely. We have analysed more than one million images from news articles in the Internet Archive and demonstrated the feasibility of the approach with two use cases in different domains: politics and entertainment.
UR - http://www.scopus.com/inward/record.url?scp=85150072196&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-63291-5_11
DO - 10.1007/978-3-030-63291-5_11
M3 - Contribution to book/anthology
AN - SCOPUS:85150072196
SN - 9783030632908
SP - 141
EP - 151
BT - The Past Web
PB - Springer International Publishing AG
CY - Cham
ER -