A time-aware random walk model for finding important documents in web archives

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Autoren

  • Tu Ngoc Nguyen
  • Nattiya Kanhabua
  • Claudia Niederée
  • Xiaofei Zhu

Organisationseinheiten

Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Titel des SammelwerksSIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval
Seiten915-918
Seitenumfang4
ISBN (elektronisch)9781450336215
PublikationsstatusVeröffentlicht - 9 Aug. 2015
Veranstaltung38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015 - Santiago, Chile
Dauer: 9 Aug. 201513 Aug. 2015

Publikationsreihe

NameSIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

Abstract

Due to their first-hand, diverse and evolution-aware reflection of nearly all areas of life, web archives are emerging as gold-mines for content analytics of many sorts. However, supporting search, which goes beyond navigational search via URLs, is a very challenging task in these unique structures with huge, redundant and noisy temporal content. In this paper, we address the search needs of expert users such as journalists, economists or historians for discovering a topic in time: Given a query, the top-k returned results should give the best representative documents that cover most interesting time-periods for the topic. For this purpose, we propose a novel random walk-based model that integrates relevance, temporal authority, diversity and time in a unified framework. Preliminary experimental results on a large-scale, real-world web archival collection shows that our method significantly improves the state-of-the-art algorithms (i.e., PageRank) in ranking temporal web pages.

ASJC Scopus Sachgebiete

Zitieren

A time-aware random walk model for finding important documents in web archives. / Nguyen, Tu Ngoc; Kanhabua, Nattiya; Niederée, Claudia et al.
SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2015. S. 915-918 (SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Nguyen, TN, Kanhabua, N, Niederée, C & Zhu, X 2015, A time-aware random walk model for finding important documents in web archives. in SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, S. 915-918, 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015, Santiago, Chile, 9 Aug. 2015. https://doi.org/10.1145/2766462.2767832
Nguyen, T. N., Kanhabua, N., Niederée, C., & Zhu, X. (2015). A time-aware random walk model for finding important documents in web archives. In SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (S. 915-918). (SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval). https://doi.org/10.1145/2766462.2767832
Nguyen TN, Kanhabua N, Niederée C, Zhu X. A time-aware random walk model for finding important documents in web archives. in SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2015. S. 915-918. (SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval). doi: 10.1145/2766462.2767832
Nguyen, Tu Ngoc ; Kanhabua, Nattiya ; Niederée, Claudia et al. / A time-aware random walk model for finding important documents in web archives. SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2015. S. 915-918 (SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval).
Download
@inproceedings{589a86feb167449f861511d3d51e4d5c,
title = "A time-aware random walk model for finding important documents in web archives",
abstract = "Due to their first-hand, diverse and evolution-aware reflection of nearly all areas of life, web archives are emerging as gold-mines for content analytics of many sorts. However, supporting search, which goes beyond navigational search via URLs, is a very challenging task in these unique structures with huge, redundant and noisy temporal content. In this paper, we address the search needs of expert users such as journalists, economists or historians for discovering a topic in time: Given a query, the top-k returned results should give the best representative documents that cover most interesting time-periods for the topic. For this purpose, we propose a novel random walk-based model that integrates relevance, temporal authority, diversity and time in a unified framework. Preliminary experimental results on a large-scale, real-world web archival collection shows that our method significantly improves the state-of-the-art algorithms (i.e., PageRank) in ranking temporal web pages.",
keywords = "Authority, Diversity, Temporal ranking, Web archive",
author = "Nguyen, {Tu Ngoc} and Nattiya Kanhabua and Claudia Nieder{\'e}e and Xiaofei Zhu",
year = "2015",
month = aug,
day = "9",
doi = "10.1145/2766462.2767832",
language = "English",
series = "SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval",
pages = "915--918",
booktitle = "SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval",
note = "38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015 ; Conference date: 09-08-2015 Through 13-08-2015",

}

Download

TY - GEN

T1 - A time-aware random walk model for finding important documents in web archives

AU - Nguyen, Tu Ngoc

AU - Kanhabua, Nattiya

AU - Niederée, Claudia

AU - Zhu, Xiaofei

PY - 2015/8/9

Y1 - 2015/8/9

N2 - Due to their first-hand, diverse and evolution-aware reflection of nearly all areas of life, web archives are emerging as gold-mines for content analytics of many sorts. However, supporting search, which goes beyond navigational search via URLs, is a very challenging task in these unique structures with huge, redundant and noisy temporal content. In this paper, we address the search needs of expert users such as journalists, economists or historians for discovering a topic in time: Given a query, the top-k returned results should give the best representative documents that cover most interesting time-periods for the topic. For this purpose, we propose a novel random walk-based model that integrates relevance, temporal authority, diversity and time in a unified framework. Preliminary experimental results on a large-scale, real-world web archival collection shows that our method significantly improves the state-of-the-art algorithms (i.e., PageRank) in ranking temporal web pages.

AB - Due to their first-hand, diverse and evolution-aware reflection of nearly all areas of life, web archives are emerging as gold-mines for content analytics of many sorts. However, supporting search, which goes beyond navigational search via URLs, is a very challenging task in these unique structures with huge, redundant and noisy temporal content. In this paper, we address the search needs of expert users such as journalists, economists or historians for discovering a topic in time: Given a query, the top-k returned results should give the best representative documents that cover most interesting time-periods for the topic. For this purpose, we propose a novel random walk-based model that integrates relevance, temporal authority, diversity and time in a unified framework. Preliminary experimental results on a large-scale, real-world web archival collection shows that our method significantly improves the state-of-the-art algorithms (i.e., PageRank) in ranking temporal web pages.

KW - Authority

KW - Diversity

KW - Temporal ranking

KW - Web archive

UR - http://www.scopus.com/inward/record.url?scp=84953775270&partnerID=8YFLogxK

U2 - 10.1145/2766462.2767832

DO - 10.1145/2766462.2767832

M3 - Conference contribution

AN - SCOPUS:84953775270

T3 - SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

SP - 915

EP - 918

BT - SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

T2 - 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015

Y2 - 9 August 2015 through 13 August 2015

ER -