A Random Walk Model for Optimization of Search Impact in Web Frontier Ranking

Giang Tran; Ata Turk; B. Barla Cambazoglu; Wolfgang Nejdl

doi:10.1145/2766462.2767737

Details

Originalsprache	Englisch
Titel des Sammelwerks	SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval
Seiten	153-162
Seitenumfang	10
ISBN (elektronisch)	9781450336215
Publikationsstatus	Veröffentlicht - 9 Aug. 2015
Veranstaltung	38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015 - Santiago, Chile Dauer: 9 Aug. 2015 → 13 Aug. 2015

Abstract

Large-scale web search engines need to crawl the Web continuously to discover and download newly created web content. The speed at which the new content is discovered and the quality of the discovered content can have a big impact on the coverage and quality of the results provided by the search engine. In this paper, we propose a search-centric solution to the problem of prioritizing the pages in the frontier of a crawler for download. Our approach essentially orders the web pages in the frontier through a random walk model that takes into account the pages' potential impact on user-perceived search quality. In addition, we propose a link graph enrichment technique that extends this solution. Finally, we explore a machine learning approach that combines different frontier prioritization approaches. We conduct experiments using two very large, real-life web datasets to observe various search quality metrics. Comparisons with several baseline techniques indicate that the proposed approaches have the potential to improve the user-perceived quality of web search results considerably.

ASJC Scopus Sachgebiete

Informatik (insg.)
Information systems
Informatik (insg.)
Software

Zitieren

A Random Walk Model for Optimization of Search Impact in Web Frontier Ranking. / Tran, Giang; Turk, Ata; Cambazoglu, B. Barla et al.
SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2015. S. 153-162.

Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review

Tran, G, Turk, A, Cambazoglu, BB & Nejdl, W 2015, A Random Walk Model for Optimization of Search Impact in Web Frontier Ranking. in SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. S. 153-162, 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015, Santiago, Chile, 9 Aug. 2015. https://doi.org/10.1145/2766462.2767737

Tran, G., Turk, A., Cambazoglu, B. B., & Nejdl, W. (2015). A Random Walk Model for Optimization of Search Impact in Web Frontier Ranking. In SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (S. 153-162) https://doi.org/10.1145/2766462.2767737

Tran G, Turk A, Cambazoglu BB, Nejdl W. A Random Walk Model for Optimization of Search Impact in Web Frontier Ranking. in SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2015. S. 153-162 doi: 10.1145/2766462.2767737

Tran, Giang ; Turk, Ata ; Cambazoglu, B. Barla et al. / A Random Walk Model for Optimization of Search Impact in Web Frontier Ranking. SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2015. S. 153-162

Download

@inproceedings{a7464169c6e249ed89c077b6fbe6217c,

title = "A Random Walk Model for Optimization of Search Impact in Web Frontier Ranking",

abstract = "Large-scale web search engines need to crawl the Web continuously to discover and download newly created web content. The speed at which the new content is discovered and the quality of the discovered content can have a big impact on the coverage and quality of the results provided by the search engine. In this paper, we propose a search-centric solution to the problem of prioritizing the pages in the frontier of a crawler for download. Our approach essentially orders the web pages in the frontier through a random walk model that takes into account the pages' potential impact on user-perceived search quality. In addition, we propose a link graph enrichment technique that extends this solution. Finally, we explore a machine learning approach that combines different frontier prioritization approaches. We conduct experiments using two very large, real-life web datasets to observe various search quality metrics. Comparisons with several baseline techniques indicate that the proposed approaches have the potential to improve the user-perceived quality of web search results considerably.",

keywords = "Discovery, Frontier ranking, Random walks, Result relevance, URL prioritization, Web crawling, Web frontier, Web search engine",

author = "Giang Tran and Ata Turk and Cambazoglu, {B. Barla} and Wolfgang Nejdl",

note = "Funding information: This work was supported by the ERC Advanced Grant ALEXANDRIA (339233) and the LEADS project (ICT- 318809), funded by the European Community.; 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015 ; Conference date: 09-08-2015 Through 13-08-2015",

year = "2015",

month = aug,

day = "9",

doi = "10.1145/2766462.2767737",

language = "English",

pages = "153--162",

booktitle = "SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval",

}

Download

TY - GEN

T1 - A Random Walk Model for Optimization of Search Impact in Web Frontier Ranking

AU - Tran, Giang

AU - Turk, Ata

AU - Cambazoglu, B. Barla

AU - Nejdl, Wolfgang

N1 - Funding information: This work was supported by the ERC Advanced Grant ALEXANDRIA (339233) and the LEADS project (ICT- 318809), funded by the European Community.

PY - 2015/8/9

Y1 - 2015/8/9

N2 - Large-scale web search engines need to crawl the Web continuously to discover and download newly created web content. The speed at which the new content is discovered and the quality of the discovered content can have a big impact on the coverage and quality of the results provided by the search engine. In this paper, we propose a search-centric solution to the problem of prioritizing the pages in the frontier of a crawler for download. Our approach essentially orders the web pages in the frontier through a random walk model that takes into account the pages' potential impact on user-perceived search quality. In addition, we propose a link graph enrichment technique that extends this solution. Finally, we explore a machine learning approach that combines different frontier prioritization approaches. We conduct experiments using two very large, real-life web datasets to observe various search quality metrics. Comparisons with several baseline techniques indicate that the proposed approaches have the potential to improve the user-perceived quality of web search results considerably.

AB - Large-scale web search engines need to crawl the Web continuously to discover and download newly created web content. The speed at which the new content is discovered and the quality of the discovered content can have a big impact on the coverage and quality of the results provided by the search engine. In this paper, we propose a search-centric solution to the problem of prioritizing the pages in the frontier of a crawler for download. Our approach essentially orders the web pages in the frontier through a random walk model that takes into account the pages' potential impact on user-perceived search quality. In addition, we propose a link graph enrichment technique that extends this solution. Finally, we explore a machine learning approach that combines different frontier prioritization approaches. We conduct experiments using two very large, real-life web datasets to observe various search quality metrics. Comparisons with several baseline techniques indicate that the proposed approaches have the potential to improve the user-perceived quality of web search results considerably.

KW - Discovery

KW - Frontier ranking

KW - Random walks

KW - Result relevance

KW - URL prioritization

KW - Web crawling

KW - Web frontier

KW - Web search engine

UR - http://www.scopus.com/inward/record.url?scp=84953711427&partnerID=8YFLogxK

U2 - 10.1145/2766462.2767737

DO - 10.1145/2766462.2767737

M3 - Conference contribution

AN - SCOPUS:84953711427

SP - 153

EP - 162

BT - SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

T2 - 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015

Y2 - 9 August 2015 through 13 August 2015

ER -

Research@Leibniz University

A Random Walk Model for Optimization of Search Impact in Web Frontier Ranking

Autoren

Organisationseinheiten

Externe Organisationen

Details

Abstract

ASJC Scopus Sachgebiete

Zitieren

Von denselben Autoren

Robust Fusion of Time Series and Image Data for Improved Multimodal Clinical Prediction

Harnessing Empathy and Ethics for Relevance Detection and Information Categorization in Climate and COVID-19 Tweets

Open benchmark for filtering techniques in entity resolution

Beyond Accuracy: Investigating Error Types in GPT-4 Responses to USMLE Questions

An artificial intelligence-assisted clinical framework to facilitate diagnostics and translational discovery in hematologic neoplasia