Details
Original language | English |
---|---|
Pages (from-to) | 2019-2040 |
Number of pages | 22 |
Journal | Computer networks |
Volume | 54 |
Issue number | 12 |
Publication status | Published - 26 Aug 2010 |
Abstract
Distributed hash tables (DHTs) are very efficient for querying based on key lookups. However, building huge term indexes, as required for IR-style keyword search, poses a scalability challenge for plain DHTs. Due to the large sizes of document term vocabularies, peers joining the network cause huge amounts of key inserts and, consequently, a large number of index maintenance messages. Thus, the key to exploiting DHTs for distributed information retrieval is to reduce index maintenance costs. Various approaches in this direction have been pursued, including the use of hybrid infrastructures, or changing the granularity of the inverted index to peer level. We show that indexing costs can be significantly reduced further by letting peers form groups in a self-organized fashion. Instead of each individual peer submitting index information separately, all peers of a group cooperate to publish the index updates to the DHT in batches. Our evaluation shows that this approach reduces index maintenance cost by an order of magnitude, while still keeping a complete and correct term index for query processing.
Keywords
- DHT, Hybrid topologies, P2P information retrieval
ASJC Scopus subject areas
- Computer Science(all)
- Computer Networks and Communications
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
In: Computer networks, Vol. 54, No. 12, 26.08.2010, p. 2019-2040.
Research output: Contribution to journal › Article › Research › peer review
}
TY - JOUR
T1 - PCIR
T2 - Combining DHTs and peer clusters for efficient full-text P2P indexing
AU - Papapetrou, Odysseas
AU - Siberski, Wolf
AU - Nejdl, Wolfgang
PY - 2010/8/26
Y1 - 2010/8/26
N2 - Distributed hash tables (DHTs) are very efficient for querying based on key lookups. However, building huge term indexes, as required for IR-style keyword search, poses a scalability challenge for plain DHTs. Due to the large sizes of document term vocabularies, peers joining the network cause huge amounts of key inserts and, consequently, a large number of index maintenance messages. Thus, the key to exploiting DHTs for distributed information retrieval is to reduce index maintenance costs. Various approaches in this direction have been pursued, including the use of hybrid infrastructures, or changing the granularity of the inverted index to peer level. We show that indexing costs can be significantly reduced further by letting peers form groups in a self-organized fashion. Instead of each individual peer submitting index information separately, all peers of a group cooperate to publish the index updates to the DHT in batches. Our evaluation shows that this approach reduces index maintenance cost by an order of magnitude, while still keeping a complete and correct term index for query processing.
AB - Distributed hash tables (DHTs) are very efficient for querying based on key lookups. However, building huge term indexes, as required for IR-style keyword search, poses a scalability challenge for plain DHTs. Due to the large sizes of document term vocabularies, peers joining the network cause huge amounts of key inserts and, consequently, a large number of index maintenance messages. Thus, the key to exploiting DHTs for distributed information retrieval is to reduce index maintenance costs. Various approaches in this direction have been pursued, including the use of hybrid infrastructures, or changing the granularity of the inverted index to peer level. We show that indexing costs can be significantly reduced further by letting peers form groups in a self-organized fashion. Instead of each individual peer submitting index information separately, all peers of a group cooperate to publish the index updates to the DHT in batches. Our evaluation shows that this approach reduces index maintenance cost by an order of magnitude, while still keeping a complete and correct term index for query processing.
KW - DHT
KW - Hybrid topologies
KW - P2P information retrieval
UR - http://www.scopus.com/inward/record.url?scp=77955426847&partnerID=8YFLogxK
U2 - 10.1016/j.comnet.2010.03.025
DO - 10.1016/j.comnet.2010.03.025
M3 - Article
AN - SCOPUS:77955426847
VL - 54
SP - 2019
EP - 2040
JO - Computer networks
JF - Computer networks
SN - 1389-1286
IS - 12
ER -