Towards Temporal URI Collections for Named Entities

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

  • Sergej Wildemann
  • Helge Holzmann

Research Organisations

External Research Organisations

  • Internet Archive
View graph of relations

Details

Original languageEnglish
Title of host publication2019 ACM/IEEE Joint Conference on Digital Libraries
Subtitle of host publicationJCDL 2019
EditorsMaria Bonn, Dan Wu, Stephen J. Downie, Alain Martaus
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages241-250
Number of pages10
ISBN (electronic)978-1-7281-1547-4
ISBN (print)978-1-7281-1548-1
Publication statusPublished - 2019
Event19th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2019 - Urbana-Champaign, United States
Duration: 2 Jun 20196 Jun 2019
Conference number: 19

Publication series

NameProceedings of the ACM/IEEE Joint Conference on Digital Libraries
Volume2019
ISSN (Print)1552-5996

Abstract

Web archives represent crucial endeavors in preserving the Web from the past and provide a valuable resource for researchers of different disciplines. Due to their size, navigation in these collections is often limited to specifying an URI and the desired date. However, typical research questions often revolve around the evolution of entities instead of specific websites. Although full-text search often seems to be the first choice to look up web pages, while it provides a quick way to yield the best match with a keyword, its diversified ranking is not made for compiling reliable entity related collections. Further, it generally ignores the temporal relevance that is needed to find pages from the past, e.g., in web archives. In this paper, we present a collection of ranked resource identifiers, characterizing named entities over time. For this purpose, different datasets were collected and evaluated by comparing each with a combination of others. Benchmarked against web search engines, our approach achieves a remarkable precision of 83.3 % and shows promising results for high-quality lookups and temporal collection building. To not only rely on existing datasets, we have implemented an interactive platform to get humans in the loop to expand the collection by contributing URIs, metadata and temporal information as well as to correct errors.

Keywords

    Collaborative Knowledge, Temporal Information Retrieval, Web Archives

ASJC Scopus subject areas

Cite this

Towards Temporal URI Collections for Named Entities. / Wildemann, Sergej; Holzmann, Helge.
2019 ACM/IEEE Joint Conference on Digital Libraries: JCDL 2019. ed. / Maria Bonn; Dan Wu; Stephen J. Downie; Alain Martaus. Institute of Electrical and Electronics Engineers Inc., 2019. p. 241-250 (Proceedings of the ACM/IEEE Joint Conference on Digital Libraries; Vol. 2019).

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Wildemann, S & Holzmann, H 2019, Towards Temporal URI Collections for Named Entities. in M Bonn, D Wu, SJ Downie & A Martaus (eds), 2019 ACM/IEEE Joint Conference on Digital Libraries: JCDL 2019. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, vol. 2019, Institute of Electrical and Electronics Engineers Inc., pp. 241-250, 19th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2019, Urbana-Champaign, Illinois, United States, 2 Jun 2019. https://doi.org/10.1109/JCDL.2019.00-68
Wildemann, S., & Holzmann, H. (2019). Towards Temporal URI Collections for Named Entities. In M. Bonn, D. Wu, S. J. Downie, & A. Martaus (Eds.), 2019 ACM/IEEE Joint Conference on Digital Libraries: JCDL 2019 (pp. 241-250). (Proceedings of the ACM/IEEE Joint Conference on Digital Libraries; Vol. 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/JCDL.2019.00-68
Wildemann S, Holzmann H. Towards Temporal URI Collections for Named Entities. In Bonn M, Wu D, Downie SJ, Martaus A, editors, 2019 ACM/IEEE Joint Conference on Digital Libraries: JCDL 2019. Institute of Electrical and Electronics Engineers Inc. 2019. p. 241-250. (Proceedings of the ACM/IEEE Joint Conference on Digital Libraries). doi: 10.1109/JCDL.2019.00-68
Wildemann, Sergej ; Holzmann, Helge. / Towards Temporal URI Collections for Named Entities. 2019 ACM/IEEE Joint Conference on Digital Libraries: JCDL 2019. editor / Maria Bonn ; Dan Wu ; Stephen J. Downie ; Alain Martaus. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 241-250 (Proceedings of the ACM/IEEE Joint Conference on Digital Libraries).
Download
@inproceedings{cc8a1a92c5d544e3b4c9001d83ba86ed,
title = "Towards Temporal URI Collections for Named Entities",
abstract = "Web archives represent crucial endeavors in preserving the Web from the past and provide a valuable resource for researchers of different disciplines. Due to their size, navigation in these collections is often limited to specifying an URI and the desired date. However, typical research questions often revolve around the evolution of entities instead of specific websites. Although full-text search often seems to be the first choice to look up web pages, while it provides a quick way to yield the best match with a keyword, its diversified ranking is not made for compiling reliable entity related collections. Further, it generally ignores the temporal relevance that is needed to find pages from the past, e.g., in web archives. In this paper, we present a collection of ranked resource identifiers, characterizing named entities over time. For this purpose, different datasets were collected and evaluated by comparing each with a combination of others. Benchmarked against web search engines, our approach achieves a remarkable precision of 83.3 % and shows promising results for high-quality lookups and temporal collection building. To not only rely on existing datasets, we have implemented an interactive platform to get humans in the loop to expand the collection by contributing URIs, metadata and temporal information as well as to correct errors.",
keywords = "Collaborative Knowledge, Temporal Information Retrieval, Web Archives",
author = "Sergej Wildemann and Helge Holzmann",
note = "Funding Information: The work was partially funded by the European Commission for the ERC Advanced Grant ALEXANDRIA (No. 339233).; 19th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2019 ; Conference date: 02-06-2019 Through 06-06-2019",
year = "2019",
doi = "10.1109/JCDL.2019.00-68",
language = "English",
isbn = "978-1-7281-1548-1",
series = "Proceedings of the ACM/IEEE Joint Conference on Digital Libraries",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "241--250",
editor = "Maria Bonn and Dan Wu and Downie, {Stephen J.} and Alain Martaus",
booktitle = "2019 ACM/IEEE Joint Conference on Digital Libraries",
address = "United States",

}

Download

TY - GEN

T1 - Towards Temporal URI Collections for Named Entities

AU - Wildemann, Sergej

AU - Holzmann, Helge

N1 - Conference code: 19

PY - 2019

Y1 - 2019

N2 - Web archives represent crucial endeavors in preserving the Web from the past and provide a valuable resource for researchers of different disciplines. Due to their size, navigation in these collections is often limited to specifying an URI and the desired date. However, typical research questions often revolve around the evolution of entities instead of specific websites. Although full-text search often seems to be the first choice to look up web pages, while it provides a quick way to yield the best match with a keyword, its diversified ranking is not made for compiling reliable entity related collections. Further, it generally ignores the temporal relevance that is needed to find pages from the past, e.g., in web archives. In this paper, we present a collection of ranked resource identifiers, characterizing named entities over time. For this purpose, different datasets were collected and evaluated by comparing each with a combination of others. Benchmarked against web search engines, our approach achieves a remarkable precision of 83.3 % and shows promising results for high-quality lookups and temporal collection building. To not only rely on existing datasets, we have implemented an interactive platform to get humans in the loop to expand the collection by contributing URIs, metadata and temporal information as well as to correct errors.

AB - Web archives represent crucial endeavors in preserving the Web from the past and provide a valuable resource for researchers of different disciplines. Due to their size, navigation in these collections is often limited to specifying an URI and the desired date. However, typical research questions often revolve around the evolution of entities instead of specific websites. Although full-text search often seems to be the first choice to look up web pages, while it provides a quick way to yield the best match with a keyword, its diversified ranking is not made for compiling reliable entity related collections. Further, it generally ignores the temporal relevance that is needed to find pages from the past, e.g., in web archives. In this paper, we present a collection of ranked resource identifiers, characterizing named entities over time. For this purpose, different datasets were collected and evaluated by comparing each with a combination of others. Benchmarked against web search engines, our approach achieves a remarkable precision of 83.3 % and shows promising results for high-quality lookups and temporal collection building. To not only rely on existing datasets, we have implemented an interactive platform to get humans in the loop to expand the collection by contributing URIs, metadata and temporal information as well as to correct errors.

KW - Collaborative Knowledge

KW - Temporal Information Retrieval

KW - Web Archives

UR - http://www.scopus.com/inward/record.url?scp=85071043845&partnerID=8YFLogxK

U2 - 10.1109/JCDL.2019.00-68

DO - 10.1109/JCDL.2019.00-68

M3 - Conference contribution

AN - SCOPUS:85071043845

SN - 978-1-7281-1548-1

T3 - Proceedings of the ACM/IEEE Joint Conference on Digital Libraries

SP - 241

EP - 250

BT - 2019 ACM/IEEE Joint Conference on Digital Libraries

A2 - Bonn, Maria

A2 - Wu, Dan

A2 - Downie, Stephen J.

A2 - Martaus, Alain

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 19th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2019

Y2 - 2 June 2019 through 6 June 2019

ER -