A Random Walk Model for Optimization of Search Impact in Web Frontier Ranking

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Autoren

Organisationseinheiten

Externe Organisationen

  • Boston University (BU)
  • Yahoo Research Labs
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Titel des SammelwerksSIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval
Seiten153-162
Seitenumfang10
ISBN (elektronisch)9781450336215
PublikationsstatusVeröffentlicht - 9 Aug. 2015
Veranstaltung38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015 - Santiago, Chile
Dauer: 9 Aug. 201513 Aug. 2015

Abstract

Large-scale web search engines need to crawl the Web continuously to discover and download newly created web content. The speed at which the new content is discovered and the quality of the discovered content can have a big impact on the coverage and quality of the results provided by the search engine. In this paper, we propose a search-centric solution to the problem of prioritizing the pages in the frontier of a crawler for download. Our approach essentially orders the web pages in the frontier through a random walk model that takes into account the pages' potential impact on user-perceived search quality. In addition, we propose a link graph enrichment technique that extends this solution. Finally, we explore a machine learning approach that combines different frontier prioritization approaches. We conduct experiments using two very large, real-life web datasets to observe various search quality metrics. Comparisons with several baseline techniques indicate that the proposed approaches have the potential to improve the user-perceived quality of web search results considerably.

ASJC Scopus Sachgebiete

Zitieren

A Random Walk Model for Optimization of Search Impact in Web Frontier Ranking. / Tran, Giang; Turk, Ata; Cambazoglu, B. Barla et al.
SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2015. S. 153-162.

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Tran, G, Turk, A, Cambazoglu, BB & Nejdl, W 2015, A Random Walk Model for Optimization of Search Impact in Web Frontier Ranking. in SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. S. 153-162, 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015, Santiago, Chile, 9 Aug. 2015. https://doi.org/10.1145/2766462.2767737
Tran, G., Turk, A., Cambazoglu, B. B., & Nejdl, W. (2015). A Random Walk Model for Optimization of Search Impact in Web Frontier Ranking. In SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (S. 153-162) https://doi.org/10.1145/2766462.2767737
Tran G, Turk A, Cambazoglu BB, Nejdl W. A Random Walk Model for Optimization of Search Impact in Web Frontier Ranking. in SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2015. S. 153-162 doi: 10.1145/2766462.2767737
Tran, Giang ; Turk, Ata ; Cambazoglu, B. Barla et al. / A Random Walk Model for Optimization of Search Impact in Web Frontier Ranking. SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2015. S. 153-162
Download
@inproceedings{a7464169c6e249ed89c077b6fbe6217c,
title = "A Random Walk Model for Optimization of Search Impact in Web Frontier Ranking",
abstract = "Large-scale web search engines need to crawl the Web continuously to discover and download newly created web content. The speed at which the new content is discovered and the quality of the discovered content can have a big impact on the coverage and quality of the results provided by the search engine. In this paper, we propose a search-centric solution to the problem of prioritizing the pages in the frontier of a crawler for download. Our approach essentially orders the web pages in the frontier through a random walk model that takes into account the pages' potential impact on user-perceived search quality. In addition, we propose a link graph enrichment technique that extends this solution. Finally, we explore a machine learning approach that combines different frontier prioritization approaches. We conduct experiments using two very large, real-life web datasets to observe various search quality metrics. Comparisons with several baseline techniques indicate that the proposed approaches have the potential to improve the user-perceived quality of web search results considerably.",
keywords = "Discovery, Frontier ranking, Random walks, Result relevance, URL prioritization, Web crawling, Web frontier, Web search engine",
author = "Giang Tran and Ata Turk and Cambazoglu, {B. Barla} and Wolfgang Nejdl",
note = "Funding information: This work was supported by the ERC Advanced Grant ALEXANDRIA (339233) and the LEADS project (ICT- 318809), funded by the European Community.; 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015 ; Conference date: 09-08-2015 Through 13-08-2015",
year = "2015",
month = aug,
day = "9",
doi = "10.1145/2766462.2767737",
language = "English",
pages = "153--162",
booktitle = "SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval",

}

Download

TY - GEN

T1 - A Random Walk Model for Optimization of Search Impact in Web Frontier Ranking

AU - Tran, Giang

AU - Turk, Ata

AU - Cambazoglu, B. Barla

AU - Nejdl, Wolfgang

N1 - Funding information: This work was supported by the ERC Advanced Grant ALEXANDRIA (339233) and the LEADS project (ICT- 318809), funded by the European Community.

PY - 2015/8/9

Y1 - 2015/8/9

N2 - Large-scale web search engines need to crawl the Web continuously to discover and download newly created web content. The speed at which the new content is discovered and the quality of the discovered content can have a big impact on the coverage and quality of the results provided by the search engine. In this paper, we propose a search-centric solution to the problem of prioritizing the pages in the frontier of a crawler for download. Our approach essentially orders the web pages in the frontier through a random walk model that takes into account the pages' potential impact on user-perceived search quality. In addition, we propose a link graph enrichment technique that extends this solution. Finally, we explore a machine learning approach that combines different frontier prioritization approaches. We conduct experiments using two very large, real-life web datasets to observe various search quality metrics. Comparisons with several baseline techniques indicate that the proposed approaches have the potential to improve the user-perceived quality of web search results considerably.

AB - Large-scale web search engines need to crawl the Web continuously to discover and download newly created web content. The speed at which the new content is discovered and the quality of the discovered content can have a big impact on the coverage and quality of the results provided by the search engine. In this paper, we propose a search-centric solution to the problem of prioritizing the pages in the frontier of a crawler for download. Our approach essentially orders the web pages in the frontier through a random walk model that takes into account the pages' potential impact on user-perceived search quality. In addition, we propose a link graph enrichment technique that extends this solution. Finally, we explore a machine learning approach that combines different frontier prioritization approaches. We conduct experiments using two very large, real-life web datasets to observe various search quality metrics. Comparisons with several baseline techniques indicate that the proposed approaches have the potential to improve the user-perceived quality of web search results considerably.

KW - Discovery

KW - Frontier ranking

KW - Random walks

KW - Result relevance

KW - URL prioritization

KW - Web crawling

KW - Web frontier

KW - Web search engine

UR - http://www.scopus.com/inward/record.url?scp=84953711427&partnerID=8YFLogxK

U2 - 10.1145/2766462.2767737

DO - 10.1145/2766462.2767737

M3 - Conference contribution

AN - SCOPUS:84953711427

SP - 153

EP - 162

BT - SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

T2 - 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015

Y2 - 9 August 2015 through 13 August 2015

ER -

Von denselben Autoren