Using Site-Level Connections to Estimate Link Confidence

Research output: Contribution to journalArticleResearchpeer review

Authors

  • Jucimar Souza
  • André Carvalho
  • Marco Cristo
  • Edleno Moura
  • Pavel Calado
  • Paul Alexandru Chirita
  • Wolfgang Nejdl

Research Organisations

External Research Organisations

  • Universidade Federal do Amazonas
  • INESC-ID
  • Adobe Systems Incorporated
View graph of relations

Details

Original languageEnglish
Pages (from-to)2294-2312
Number of pages19
JournalJournal of the American Society for Information Science and Technology
Volume63
Issue number11
Publication statusPublished - 16 Oct 2012

Abstract

Search engines are essential tools for web users today. They rely on a large number of features to compute the rank of search results for each given query. The estimated reputation of pages is among the effective features available for search engine designers, probably being adopted by most current commercial search engines. Page reputation is estimated by analyzing the linkage relationships between pages. This information is used by link analysis algorithms as a query-independent feature, to be taken into account when computing the rank of the results. Unfortunately, several types of links found on the web may damage the estimated page reputation and thus cause a negative effect on the quality of search results. This work studies alternatives to reduce the negative impact of such noisy links. More specifically, the authors propose and evaluate new methods that deal with noisy links, considering scenarios where the reputation of pages is computed using the PageRank algorithm. They show, through experiments with real web content, that their methods achieve significant improvements when compared to previous solutions proposed in the literature.

Keywords

    information retrieval software, information storage and retrieval systems, search engines

ASJC Scopus subject areas

Cite this

Using Site-Level Connections to Estimate Link Confidence. / Souza, Jucimar; Carvalho, André; Cristo, Marco et al.
In: Journal of the American Society for Information Science and Technology, Vol. 63, No. 11, 16.10.2012, p. 2294-2312.

Research output: Contribution to journalArticleResearchpeer review

Souza J, Carvalho A, Cristo M, Moura E, Calado P, Chirita PA et al. Using Site-Level Connections to Estimate Link Confidence. Journal of the American Society for Information Science and Technology. 2012 Oct 16;63(11):2294-2312. doi: 10.1002/asi.22729
Souza, Jucimar ; Carvalho, André ; Cristo, Marco et al. / Using Site-Level Connections to Estimate Link Confidence. In: Journal of the American Society for Information Science and Technology. 2012 ; Vol. 63, No. 11. pp. 2294-2312.
Download
@article{2354b992202c422988cd686830a19db7,
title = "Using Site-Level Connections to Estimate Link Confidence",
abstract = "Search engines are essential tools for web users today. They rely on a large number of features to compute the rank of search results for each given query. The estimated reputation of pages is among the effective features available for search engine designers, probably being adopted by most current commercial search engines. Page reputation is estimated by analyzing the linkage relationships between pages. This information is used by link analysis algorithms as a query-independent feature, to be taken into account when computing the rank of the results. Unfortunately, several types of links found on the web may damage the estimated page reputation and thus cause a negative effect on the quality of search results. This work studies alternatives to reduce the negative impact of such noisy links. More specifically, the authors propose and evaluate new methods that deal with noisy links, considering scenarios where the reputation of pages is computed using the PageRank algorithm. They show, through experiments with real web content, that their methods achieve significant improvements when compared to previous solutions proposed in the literature.",
keywords = "information retrieval software, information storage and retrieval systems, search engines",
author = "Jucimar Souza and Andr{\'e} Carvalho and Marco Cristo and Edleno Moura and Pavel Calado and Chirita, {Paul Alexandru} and Wolfgang Nejdl",
year = "2012",
month = oct,
day = "16",
doi = "10.1002/asi.22729",
language = "English",
volume = "63",
pages = "2294--2312",
journal = "Journal of the American Society for Information Science and Technology",
issn = "1532-2882",
publisher = "John Wiley and Sons Inc.",
number = "11",

}

Download

TY - JOUR

T1 - Using Site-Level Connections to Estimate Link Confidence

AU - Souza, Jucimar

AU - Carvalho, André

AU - Cristo, Marco

AU - Moura, Edleno

AU - Calado, Pavel

AU - Chirita, Paul Alexandru

AU - Nejdl, Wolfgang

PY - 2012/10/16

Y1 - 2012/10/16

N2 - Search engines are essential tools for web users today. They rely on a large number of features to compute the rank of search results for each given query. The estimated reputation of pages is among the effective features available for search engine designers, probably being adopted by most current commercial search engines. Page reputation is estimated by analyzing the linkage relationships between pages. This information is used by link analysis algorithms as a query-independent feature, to be taken into account when computing the rank of the results. Unfortunately, several types of links found on the web may damage the estimated page reputation and thus cause a negative effect on the quality of search results. This work studies alternatives to reduce the negative impact of such noisy links. More specifically, the authors propose and evaluate new methods that deal with noisy links, considering scenarios where the reputation of pages is computed using the PageRank algorithm. They show, through experiments with real web content, that their methods achieve significant improvements when compared to previous solutions proposed in the literature.

AB - Search engines are essential tools for web users today. They rely on a large number of features to compute the rank of search results for each given query. The estimated reputation of pages is among the effective features available for search engine designers, probably being adopted by most current commercial search engines. Page reputation is estimated by analyzing the linkage relationships between pages. This information is used by link analysis algorithms as a query-independent feature, to be taken into account when computing the rank of the results. Unfortunately, several types of links found on the web may damage the estimated page reputation and thus cause a negative effect on the quality of search results. This work studies alternatives to reduce the negative impact of such noisy links. More specifically, the authors propose and evaluate new methods that deal with noisy links, considering scenarios where the reputation of pages is computed using the PageRank algorithm. They show, through experiments with real web content, that their methods achieve significant improvements when compared to previous solutions proposed in the literature.

KW - information retrieval software

KW - information storage and retrieval systems

KW - search engines

UR - http://www.scopus.com/inward/record.url?scp=84868203478&partnerID=8YFLogxK

U2 - 10.1002/asi.22729

DO - 10.1002/asi.22729

M3 - Article

AN - SCOPUS:84868203478

VL - 63

SP - 2294

EP - 2312

JO - Journal of the American Society for Information Science and Technology

JF - Journal of the American Society for Information Science and Technology

SN - 1532-2882

IS - 11

ER -

By the same author(s)