A time-aware random walk model for finding important documents in web archives

Tu Ngoc Nguyen; Nattiya Kanhabua; Claudia Niederée; Xiaofei Zhu

doi:10.1145/2766462.2767832

Details

Originalsprache	Englisch
Titel des Sammelwerks	SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval
Seiten	915-918
Seitenumfang	4
ISBN (elektronisch)	9781450336215
Publikationsstatus	Veröffentlicht - 9 Aug. 2015
Veranstaltung	38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015 - Santiago, Chile Dauer: 9 Aug. 2015 → 13 Aug. 2015

Publikationsreihe

Name	SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

Abstract

Due to their first-hand, diverse and evolution-aware reflection of nearly all areas of life, web archives are emerging as gold-mines for content analytics of many sorts. However, supporting search, which goes beyond navigational search via URLs, is a very challenging task in these unique structures with huge, redundant and noisy temporal content. In this paper, we address the search needs of expert users such as journalists, economists or historians for discovering a topic in time: Given a query, the top-k returned results should give the best representative documents that cover most interesting time-periods for the topic. For this purpose, we propose a novel random walk-based model that integrates relevance, temporal authority, diversity and time in a unified framework. Preliminary experimental results on a large-scale, real-world web archival collection shows that our method significantly improves the state-of-the-art algorithms (i.e., PageRank) in ranking temporal web pages.

ASJC Scopus Sachgebiete

Informatik (insg.)
Information systems
Informatik (insg.)
Software

Zitieren

A time-aware random walk model for finding important documents in web archives. / Nguyen, Tu Ngoc; Kanhabua, Nattiya; Niederée, Claudia et al.
SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2015. S. 915-918 (SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review

Nguyen, TN, Kanhabua, N, Niederée, C & Zhu, X 2015, A time-aware random walk model for finding important documents in web archives. in SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, S. 915-918, 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015, Santiago, Chile, 9 Aug. 2015. https://doi.org/10.1145/2766462.2767832

Nguyen, T. N., Kanhabua, N., Niederée, C., & Zhu, X. (2015). A time-aware random walk model for finding important documents in web archives. In SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (S. 915-918). (SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval). https://doi.org/10.1145/2766462.2767832

Nguyen TN, Kanhabua N, Niederée C, Zhu X. A time-aware random walk model for finding important documents in web archives. in SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2015. S. 915-918. (SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval). doi: 10.1145/2766462.2767832

Nguyen, Tu Ngoc ; Kanhabua, Nattiya ; Niederée, Claudia et al. / A time-aware random walk model for finding important documents in web archives. SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2015. S. 915-918 (SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval).

Download

@inproceedings{589a86feb167449f861511d3d51e4d5c,

title = "A time-aware random walk model for finding important documents in web archives",

abstract = "Due to their first-hand, diverse and evolution-aware reflection of nearly all areas of life, web archives are emerging as gold-mines for content analytics of many sorts. However, supporting search, which goes beyond navigational search via URLs, is a very challenging task in these unique structures with huge, redundant and noisy temporal content. In this paper, we address the search needs of expert users such as journalists, economists or historians for discovering a topic in time: Given a query, the top-k returned results should give the best representative documents that cover most interesting time-periods for the topic. For this purpose, we propose a novel random walk-based model that integrates relevance, temporal authority, diversity and time in a unified framework. Preliminary experimental results on a large-scale, real-world web archival collection shows that our method significantly improves the state-of-the-art algorithms (i.e., PageRank) in ranking temporal web pages.",

keywords = "Authority, Diversity, Temporal ranking, Web archive",

author = "Nguyen, {Tu Ngoc} and Nattiya Kanhabua and Claudia Nieder{\'e}e and Xiaofei Zhu",

year = "2015",

month = aug,

day = "9",

doi = "10.1145/2766462.2767832",

language = "English",

series = "SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval",

pages = "915--918",

booktitle = "SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval",

note = "38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015 ; Conference date: 09-08-2015 Through 13-08-2015",

}

Download

TY - GEN

T1 - A time-aware random walk model for finding important documents in web archives

AU - Nguyen, Tu Ngoc

AU - Kanhabua, Nattiya

AU - Niederée, Claudia

AU - Zhu, Xiaofei

PY - 2015/8/9

Y1 - 2015/8/9

N2 - Due to their first-hand, diverse and evolution-aware reflection of nearly all areas of life, web archives are emerging as gold-mines for content analytics of many sorts. However, supporting search, which goes beyond navigational search via URLs, is a very challenging task in these unique structures with huge, redundant and noisy temporal content. In this paper, we address the search needs of expert users such as journalists, economists or historians for discovering a topic in time: Given a query, the top-k returned results should give the best representative documents that cover most interesting time-periods for the topic. For this purpose, we propose a novel random walk-based model that integrates relevance, temporal authority, diversity and time in a unified framework. Preliminary experimental results on a large-scale, real-world web archival collection shows that our method significantly improves the state-of-the-art algorithms (i.e., PageRank) in ranking temporal web pages.

AB - Due to their first-hand, diverse and evolution-aware reflection of nearly all areas of life, web archives are emerging as gold-mines for content analytics of many sorts. However, supporting search, which goes beyond navigational search via URLs, is a very challenging task in these unique structures with huge, redundant and noisy temporal content. In this paper, we address the search needs of expert users such as journalists, economists or historians for discovering a topic in time: Given a query, the top-k returned results should give the best representative documents that cover most interesting time-periods for the topic. For this purpose, we propose a novel random walk-based model that integrates relevance, temporal authority, diversity and time in a unified framework. Preliminary experimental results on a large-scale, real-world web archival collection shows that our method significantly improves the state-of-the-art algorithms (i.e., PageRank) in ranking temporal web pages.

KW - Authority

KW - Diversity

KW - Temporal ranking

KW - Web archive

UR - http://www.scopus.com/inward/record.url?scp=84953775270&partnerID=8YFLogxK

U2 - 10.1145/2766462.2767832

DO - 10.1145/2766462.2767832

M3 - Conference contribution

AN - SCOPUS:84953775270

T3 - SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

SP - 915

EP - 918

BT - SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

T2 - 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015

Y2 - 9 August 2015 through 13 August 2015

ER -

Research@Leibniz University

A time-aware random walk model for finding important documents in web archives

Autoren

Organisationseinheiten

Details

Publikationsreihe

Abstract

ASJC Scopus Sachgebiete

Zitieren