Details
Original language | English |
---|---|
Title of host publication | 2019 ACM/IEEE Joint Conference on Digital Libraries |
Subtitle of host publication | JCDL 2019 |
Editors | Maria Bonn, Dan Wu, Stephen J. Downie, Alain Martaus |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 241-250 |
Number of pages | 10 |
ISBN (electronic) | 978-1-7281-1547-4 |
ISBN (print) | 978-1-7281-1548-1 |
Publication status | Published - 2019 |
Event | 19th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2019 - Urbana-Champaign, United States Duration: 2 Jun 2019 → 6 Jun 2019 Conference number: 19 |
Publication series
Name | Proceedings of the ACM/IEEE Joint Conference on Digital Libraries |
---|---|
Volume | 2019 |
ISSN (Print) | 1552-5996 |
Abstract
Web archives represent crucial endeavors in preserving the Web from the past and provide a valuable resource for researchers of different disciplines. Due to their size, navigation in these collections is often limited to specifying an URI and the desired date. However, typical research questions often revolve around the evolution of entities instead of specific websites. Although full-text search often seems to be the first choice to look up web pages, while it provides a quick way to yield the best match with a keyword, its diversified ranking is not made for compiling reliable entity related collections. Further, it generally ignores the temporal relevance that is needed to find pages from the past, e.g., in web archives. In this paper, we present a collection of ranked resource identifiers, characterizing named entities over time. For this purpose, different datasets were collected and evaluated by comparing each with a combination of others. Benchmarked against web search engines, our approach achieves a remarkable precision of 83.3 % and shows promising results for high-quality lookups and temporal collection building. To not only rely on existing datasets, we have implemented an interactive platform to get humans in the loop to expand the collection by contributing URIs, metadata and temporal information as well as to correct errors.
Keywords
- Collaborative Knowledge, Temporal Information Retrieval, Web Archives
ASJC Scopus subject areas
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
2019 ACM/IEEE Joint Conference on Digital Libraries: JCDL 2019. ed. / Maria Bonn; Dan Wu; Stephen J. Downie; Alain Martaus. Institute of Electrical and Electronics Engineers Inc., 2019. p. 241-250 (Proceedings of the ACM/IEEE Joint Conference on Digital Libraries; Vol. 2019).
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - Towards Temporal URI Collections for Named Entities
AU - Wildemann, Sergej
AU - Holzmann, Helge
N1 - Conference code: 19
PY - 2019
Y1 - 2019
N2 - Web archives represent crucial endeavors in preserving the Web from the past and provide a valuable resource for researchers of different disciplines. Due to their size, navigation in these collections is often limited to specifying an URI and the desired date. However, typical research questions often revolve around the evolution of entities instead of specific websites. Although full-text search often seems to be the first choice to look up web pages, while it provides a quick way to yield the best match with a keyword, its diversified ranking is not made for compiling reliable entity related collections. Further, it generally ignores the temporal relevance that is needed to find pages from the past, e.g., in web archives. In this paper, we present a collection of ranked resource identifiers, characterizing named entities over time. For this purpose, different datasets were collected and evaluated by comparing each with a combination of others. Benchmarked against web search engines, our approach achieves a remarkable precision of 83.3 % and shows promising results for high-quality lookups and temporal collection building. To not only rely on existing datasets, we have implemented an interactive platform to get humans in the loop to expand the collection by contributing URIs, metadata and temporal information as well as to correct errors.
AB - Web archives represent crucial endeavors in preserving the Web from the past and provide a valuable resource for researchers of different disciplines. Due to their size, navigation in these collections is often limited to specifying an URI and the desired date. However, typical research questions often revolve around the evolution of entities instead of specific websites. Although full-text search often seems to be the first choice to look up web pages, while it provides a quick way to yield the best match with a keyword, its diversified ranking is not made for compiling reliable entity related collections. Further, it generally ignores the temporal relevance that is needed to find pages from the past, e.g., in web archives. In this paper, we present a collection of ranked resource identifiers, characterizing named entities over time. For this purpose, different datasets were collected and evaluated by comparing each with a combination of others. Benchmarked against web search engines, our approach achieves a remarkable precision of 83.3 % and shows promising results for high-quality lookups and temporal collection building. To not only rely on existing datasets, we have implemented an interactive platform to get humans in the loop to expand the collection by contributing URIs, metadata and temporal information as well as to correct errors.
KW - Collaborative Knowledge
KW - Temporal Information Retrieval
KW - Web Archives
UR - http://www.scopus.com/inward/record.url?scp=85071043845&partnerID=8YFLogxK
U2 - 10.1109/JCDL.2019.00-68
DO - 10.1109/JCDL.2019.00-68
M3 - Conference contribution
AN - SCOPUS:85071043845
SN - 978-1-7281-1548-1
T3 - Proceedings of the ACM/IEEE Joint Conference on Digital Libraries
SP - 241
EP - 250
BT - 2019 ACM/IEEE Joint Conference on Digital Libraries
A2 - Bonn, Maria
A2 - Wu, Dan
A2 - Downie, Stephen J.
A2 - Martaus, Alain
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 19th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2019
Y2 - 2 June 2019 through 6 June 2019
ER -