A time-aware random walk model for finding important documents in web archives

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

  • Tu Ngoc Nguyen
  • Nattiya Kanhabua
  • Claudia Niederée
  • Xiaofei Zhu

Research Organisations

View graph of relations

Details

Original languageEnglish
Title of host publicationSIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval
Pages915-918
Number of pages4
ISBN (electronic)9781450336215
Publication statusPublished - 9 Aug 2015
Event38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015 - Santiago, Chile
Duration: 9 Aug 201513 Aug 2015

Publication series

NameSIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

Abstract

Due to their first-hand, diverse and evolution-aware reflection of nearly all areas of life, web archives are emerging as gold-mines for content analytics of many sorts. However, supporting search, which goes beyond navigational search via URLs, is a very challenging task in these unique structures with huge, redundant and noisy temporal content. In this paper, we address the search needs of expert users such as journalists, economists or historians for discovering a topic in time: Given a query, the top-k returned results should give the best representative documents that cover most interesting time-periods for the topic. For this purpose, we propose a novel random walk-based model that integrates relevance, temporal authority, diversity and time in a unified framework. Preliminary experimental results on a large-scale, real-world web archival collection shows that our method significantly improves the state-of-the-art algorithms (i.e., PageRank) in ranking temporal web pages.

Keywords

    Authority, Diversity, Temporal ranking, Web archive

ASJC Scopus subject areas

Cite this

A time-aware random walk model for finding important documents in web archives. / Nguyen, Tu Ngoc; Kanhabua, Nattiya; Niederée, Claudia et al.
SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2015. p. 915-918 (SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval).

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Nguyen, TN, Kanhabua, N, Niederée, C & Zhu, X 2015, A time-aware random walk model for finding important documents in web archives. in SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 915-918, 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015, Santiago, Chile, 9 Aug 2015. https://doi.org/10.1145/2766462.2767832
Nguyen, T. N., Kanhabua, N., Niederée, C., & Zhu, X. (2015). A time-aware random walk model for finding important documents in web archives. In SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 915-918). (SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval). https://doi.org/10.1145/2766462.2767832
Nguyen TN, Kanhabua N, Niederée C, Zhu X. A time-aware random walk model for finding important documents in web archives. In SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2015. p. 915-918. (SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval). doi: 10.1145/2766462.2767832
Nguyen, Tu Ngoc ; Kanhabua, Nattiya ; Niederée, Claudia et al. / A time-aware random walk model for finding important documents in web archives. SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2015. pp. 915-918 (SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval).
Download
@inproceedings{589a86feb167449f861511d3d51e4d5c,
title = "A time-aware random walk model for finding important documents in web archives",
abstract = "Due to their first-hand, diverse and evolution-aware reflection of nearly all areas of life, web archives are emerging as gold-mines for content analytics of many sorts. However, supporting search, which goes beyond navigational search via URLs, is a very challenging task in these unique structures with huge, redundant and noisy temporal content. In this paper, we address the search needs of expert users such as journalists, economists or historians for discovering a topic in time: Given a query, the top-k returned results should give the best representative documents that cover most interesting time-periods for the topic. For this purpose, we propose a novel random walk-based model that integrates relevance, temporal authority, diversity and time in a unified framework. Preliminary experimental results on a large-scale, real-world web archival collection shows that our method significantly improves the state-of-the-art algorithms (i.e., PageRank) in ranking temporal web pages.",
keywords = "Authority, Diversity, Temporal ranking, Web archive",
author = "Nguyen, {Tu Ngoc} and Nattiya Kanhabua and Claudia Nieder{\'e}e and Xiaofei Zhu",
year = "2015",
month = aug,
day = "9",
doi = "10.1145/2766462.2767832",
language = "English",
series = "SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval",
pages = "915--918",
booktitle = "SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval",
note = "38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015 ; Conference date: 09-08-2015 Through 13-08-2015",

}

Download

TY - GEN

T1 - A time-aware random walk model for finding important documents in web archives

AU - Nguyen, Tu Ngoc

AU - Kanhabua, Nattiya

AU - Niederée, Claudia

AU - Zhu, Xiaofei

PY - 2015/8/9

Y1 - 2015/8/9

N2 - Due to their first-hand, diverse and evolution-aware reflection of nearly all areas of life, web archives are emerging as gold-mines for content analytics of many sorts. However, supporting search, which goes beyond navigational search via URLs, is a very challenging task in these unique structures with huge, redundant and noisy temporal content. In this paper, we address the search needs of expert users such as journalists, economists or historians for discovering a topic in time: Given a query, the top-k returned results should give the best representative documents that cover most interesting time-periods for the topic. For this purpose, we propose a novel random walk-based model that integrates relevance, temporal authority, diversity and time in a unified framework. Preliminary experimental results on a large-scale, real-world web archival collection shows that our method significantly improves the state-of-the-art algorithms (i.e., PageRank) in ranking temporal web pages.

AB - Due to their first-hand, diverse and evolution-aware reflection of nearly all areas of life, web archives are emerging as gold-mines for content analytics of many sorts. However, supporting search, which goes beyond navigational search via URLs, is a very challenging task in these unique structures with huge, redundant and noisy temporal content. In this paper, we address the search needs of expert users such as journalists, economists or historians for discovering a topic in time: Given a query, the top-k returned results should give the best representative documents that cover most interesting time-periods for the topic. For this purpose, we propose a novel random walk-based model that integrates relevance, temporal authority, diversity and time in a unified framework. Preliminary experimental results on a large-scale, real-world web archival collection shows that our method significantly improves the state-of-the-art algorithms (i.e., PageRank) in ranking temporal web pages.

KW - Authority

KW - Diversity

KW - Temporal ranking

KW - Web archive

UR - http://www.scopus.com/inward/record.url?scp=84953775270&partnerID=8YFLogxK

U2 - 10.1145/2766462.2767832

DO - 10.1145/2766462.2767832

M3 - Conference contribution

AN - SCOPUS:84953775270

T3 - SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

SP - 915

EP - 918

BT - SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

T2 - 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015

Y2 - 9 August 2015 through 13 August 2015

ER -