Details
Originalsprache | Englisch |
---|---|
Titel des Sammelwerks | SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval |
Seiten | 153-162 |
Seitenumfang | 10 |
ISBN (elektronisch) | 9781450336215 |
Publikationsstatus | Veröffentlicht - 9 Aug. 2015 |
Veranstaltung | 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015 - Santiago, Chile Dauer: 9 Aug. 2015 → 13 Aug. 2015 |
Abstract
Large-scale web search engines need to crawl the Web continuously to discover and download newly created web content. The speed at which the new content is discovered and the quality of the discovered content can have a big impact on the coverage and quality of the results provided by the search engine. In this paper, we propose a search-centric solution to the problem of prioritizing the pages in the frontier of a crawler for download. Our approach essentially orders the web pages in the frontier through a random walk model that takes into account the pages' potential impact on user-perceived search quality. In addition, we propose a link graph enrichment technique that extends this solution. Finally, we explore a machine learning approach that combines different frontier prioritization approaches. We conduct experiments using two very large, real-life web datasets to observe various search quality metrics. Comparisons with several baseline techniques indicate that the proposed approaches have the potential to improve the user-perceived quality of web search results considerably.
ASJC Scopus Sachgebiete
- Informatik (insg.)
- Information systems
- Informatik (insg.)
- Software
Zitieren
- Standard
- Harvard
- Apa
- Vancouver
- BibTex
- RIS
SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2015. S. 153-162.
Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review
}
TY - GEN
T1 - A Random Walk Model for Optimization of Search Impact in Web Frontier Ranking
AU - Tran, Giang
AU - Turk, Ata
AU - Cambazoglu, B. Barla
AU - Nejdl, Wolfgang
N1 - Funding information: This work was supported by the ERC Advanced Grant ALEXANDRIA (339233) and the LEADS project (ICT- 318809), funded by the European Community.
PY - 2015/8/9
Y1 - 2015/8/9
N2 - Large-scale web search engines need to crawl the Web continuously to discover and download newly created web content. The speed at which the new content is discovered and the quality of the discovered content can have a big impact on the coverage and quality of the results provided by the search engine. In this paper, we propose a search-centric solution to the problem of prioritizing the pages in the frontier of a crawler for download. Our approach essentially orders the web pages in the frontier through a random walk model that takes into account the pages' potential impact on user-perceived search quality. In addition, we propose a link graph enrichment technique that extends this solution. Finally, we explore a machine learning approach that combines different frontier prioritization approaches. We conduct experiments using two very large, real-life web datasets to observe various search quality metrics. Comparisons with several baseline techniques indicate that the proposed approaches have the potential to improve the user-perceived quality of web search results considerably.
AB - Large-scale web search engines need to crawl the Web continuously to discover and download newly created web content. The speed at which the new content is discovered and the quality of the discovered content can have a big impact on the coverage and quality of the results provided by the search engine. In this paper, we propose a search-centric solution to the problem of prioritizing the pages in the frontier of a crawler for download. Our approach essentially orders the web pages in the frontier through a random walk model that takes into account the pages' potential impact on user-perceived search quality. In addition, we propose a link graph enrichment technique that extends this solution. Finally, we explore a machine learning approach that combines different frontier prioritization approaches. We conduct experiments using two very large, real-life web datasets to observe various search quality metrics. Comparisons with several baseline techniques indicate that the proposed approaches have the potential to improve the user-perceived quality of web search results considerably.
KW - Discovery
KW - Frontier ranking
KW - Random walks
KW - Result relevance
KW - URL prioritization
KW - Web crawling
KW - Web frontier
KW - Web search engine
UR - http://www.scopus.com/inward/record.url?scp=84953711427&partnerID=8YFLogxK
U2 - 10.1145/2766462.2767737
DO - 10.1145/2766462.2767737
M3 - Conference contribution
AN - SCOPUS:84953711427
SP - 153
EP - 162
BT - SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval
T2 - 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015
Y2 - 9 August 2015 through 13 August 2015
ER -