Details
Original language | English |
---|---|
Title of host publication | SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval |
Pages | 915-918 |
Number of pages | 4 |
ISBN (electronic) | 9781450336215 |
Publication status | Published - 9 Aug 2015 |
Event | 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015 - Santiago, Chile Duration: 9 Aug 2015 → 13 Aug 2015 |
Publication series
Name | SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval |
---|
Abstract
Due to their first-hand, diverse and evolution-aware reflection of nearly all areas of life, web archives are emerging as gold-mines for content analytics of many sorts. However, supporting search, which goes beyond navigational search via URLs, is a very challenging task in these unique structures with huge, redundant and noisy temporal content. In this paper, we address the search needs of expert users such as journalists, economists or historians for discovering a topic in time: Given a query, the top-k returned results should give the best representative documents that cover most interesting time-periods for the topic. For this purpose, we propose a novel random walk-based model that integrates relevance, temporal authority, diversity and time in a unified framework. Preliminary experimental results on a large-scale, real-world web archival collection shows that our method significantly improves the state-of-the-art algorithms (i.e., PageRank) in ranking temporal web pages.
Keywords
- Authority, Diversity, Temporal ranking, Web archive
ASJC Scopus subject areas
- Computer Science(all)
- Information Systems
- Computer Science(all)
- Software
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2015. p. 915-918 (SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval).
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - A time-aware random walk model for finding important documents in web archives
AU - Nguyen, Tu Ngoc
AU - Kanhabua, Nattiya
AU - Niederée, Claudia
AU - Zhu, Xiaofei
PY - 2015/8/9
Y1 - 2015/8/9
N2 - Due to their first-hand, diverse and evolution-aware reflection of nearly all areas of life, web archives are emerging as gold-mines for content analytics of many sorts. However, supporting search, which goes beyond navigational search via URLs, is a very challenging task in these unique structures with huge, redundant and noisy temporal content. In this paper, we address the search needs of expert users such as journalists, economists or historians for discovering a topic in time: Given a query, the top-k returned results should give the best representative documents that cover most interesting time-periods for the topic. For this purpose, we propose a novel random walk-based model that integrates relevance, temporal authority, diversity and time in a unified framework. Preliminary experimental results on a large-scale, real-world web archival collection shows that our method significantly improves the state-of-the-art algorithms (i.e., PageRank) in ranking temporal web pages.
AB - Due to their first-hand, diverse and evolution-aware reflection of nearly all areas of life, web archives are emerging as gold-mines for content analytics of many sorts. However, supporting search, which goes beyond navigational search via URLs, is a very challenging task in these unique structures with huge, redundant and noisy temporal content. In this paper, we address the search needs of expert users such as journalists, economists or historians for discovering a topic in time: Given a query, the top-k returned results should give the best representative documents that cover most interesting time-periods for the topic. For this purpose, we propose a novel random walk-based model that integrates relevance, temporal authority, diversity and time in a unified framework. Preliminary experimental results on a large-scale, real-world web archival collection shows that our method significantly improves the state-of-the-art algorithms (i.e., PageRank) in ranking temporal web pages.
KW - Authority
KW - Diversity
KW - Temporal ranking
KW - Web archive
UR - http://www.scopus.com/inward/record.url?scp=84953775270&partnerID=8YFLogxK
U2 - 10.1145/2766462.2767832
DO - 10.1145/2766462.2767832
M3 - Conference contribution
AN - SCOPUS:84953775270
T3 - SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval
SP - 915
EP - 918
BT - SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval
T2 - 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015
Y2 - 9 August 2015 through 13 August 2015
ER -