Details
Original language | English |
---|---|
Title of host publication | Proceedings of the 15th International Conference on World Wide Web |
Publisher | Association for Computing Machinery (ACM) |
Pages | 73-82 |
Number of pages | 10 |
ISBN (print) | 1595933239, 9781595933232 |
Publication status | Published - 23 May 2006 |
Event | 15th International Conference on World Wide Web - Edinburgh, Scotland, United Kingdom (UK) Duration: 23 May 2006 → 26 May 2006 |
Publication series
Name | Proceedings of the 15th International Conference on World Wide Web |
---|
Abstract
The currently booming search engine industry has determined many online organizations to attempt to artificially increase their ranking in order to attract more visitors to their web sites. At the same time, the growth of the web has also inherently generated several navigational hyperlink structures that have a negative impact on the importance measures employed by current search engines. In this paper we propose and evaluate algorithms for identifying all these noisy links on the web graph, may them be spam or simple relationships between real world entities represented by sites, replication of content, etc. Unlike prior work, we target a different type of noisy link structures, residing at the site level, instead of the page level. We thus investigate and annihilate site level mutual reinforcement relationships, abnormal support coming from one site towards another, as well as complex link alliances between web sites. Our experiments with the link database of the TodoBR search engine show a very strong increase in the quality of the output rankings after having applied our techniques.
Keywords
- Link analysis, Noise reduction, PageRank, Spam
ASJC Scopus subject areas
- Computer Science(all)
- Computer Networks and Communications
- Computer Science(all)
- Software
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
Proceedings of the 15th International Conference on World Wide Web. Association for Computing Machinery (ACM), 2006. p. 73-82 (Proceedings of the 15th International Conference on World Wide Web).
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - Site level noise removal for search engines
AU - Da Costa Carvalho, André Luiz
AU - Chirita, Paul Alexandru
AU - De Moura, Edleno Silva
AU - Calado, Pável
AU - Nejdl, Wolfgang
PY - 2006/5/23
Y1 - 2006/5/23
N2 - The currently booming search engine industry has determined many online organizations to attempt to artificially increase their ranking in order to attract more visitors to their web sites. At the same time, the growth of the web has also inherently generated several navigational hyperlink structures that have a negative impact on the importance measures employed by current search engines. In this paper we propose and evaluate algorithms for identifying all these noisy links on the web graph, may them be spam or simple relationships between real world entities represented by sites, replication of content, etc. Unlike prior work, we target a different type of noisy link structures, residing at the site level, instead of the page level. We thus investigate and annihilate site level mutual reinforcement relationships, abnormal support coming from one site towards another, as well as complex link alliances between web sites. Our experiments with the link database of the TodoBR search engine show a very strong increase in the quality of the output rankings after having applied our techniques.
AB - The currently booming search engine industry has determined many online organizations to attempt to artificially increase their ranking in order to attract more visitors to their web sites. At the same time, the growth of the web has also inherently generated several navigational hyperlink structures that have a negative impact on the importance measures employed by current search engines. In this paper we propose and evaluate algorithms for identifying all these noisy links on the web graph, may them be spam or simple relationships between real world entities represented by sites, replication of content, etc. Unlike prior work, we target a different type of noisy link structures, residing at the site level, instead of the page level. We thus investigate and annihilate site level mutual reinforcement relationships, abnormal support coming from one site towards another, as well as complex link alliances between web sites. Our experiments with the link database of the TodoBR search engine show a very strong increase in the quality of the output rankings after having applied our techniques.
KW - Link analysis
KW - Noise reduction
KW - PageRank
KW - Spam
UR - http://www.scopus.com/inward/record.url?scp=34250686269&partnerID=8YFLogxK
U2 - 10.1145/1135777.1135793
DO - 10.1145/1135777.1135793
M3 - Conference contribution
AN - SCOPUS:34250686269
SN - 1595933239
SN - 9781595933232
T3 - Proceedings of the 15th International Conference on World Wide Web
SP - 73
EP - 82
BT - Proceedings of the 15th International Conference on World Wide Web
PB - Association for Computing Machinery (ACM)
T2 - 15th International Conference on World Wide Web
Y2 - 23 May 2006 through 26 May 2006
ER -