PCIR: Combining DHTs and peer clusters for efficient full-text P2P indexing

Odysseas Papapetrou; Wolf Siberski; Wolfgang Nejdl

doi:10.1016/j.comnet.2010.03.025

Details

Originalsprache	Englisch
Seiten (von - bis)	2019-2040
Seitenumfang	22
Fachzeitschrift	Computer networks
Jahrgang	54
Ausgabenummer	12
Publikationsstatus	Veröffentlicht - 26 Aug. 2010

Abstract

Distributed hash tables (DHTs) are very efficient for querying based on key lookups. However, building huge term indexes, as required for IR-style keyword search, poses a scalability challenge for plain DHTs. Due to the large sizes of document term vocabularies, peers joining the network cause huge amounts of key inserts and, consequently, a large number of index maintenance messages. Thus, the key to exploiting DHTs for distributed information retrieval is to reduce index maintenance costs. Various approaches in this direction have been pursued, including the use of hybrid infrastructures, or changing the granularity of the inverted index to peer level. We show that indexing costs can be significantly reduced further by letting peers form groups in a self-organized fashion. Instead of each individual peer submitting index information separately, all peers of a group cooperate to publish the index updates to the DHT in batches. Our evaluation shows that this approach reduces index maintenance cost by an order of magnitude, while still keeping a complete and correct term index for query processing.

ASJC Scopus Sachgebiete

Informatik (insg.)
Computernetzwerke und -kommunikation

Zitieren

PCIR: Combining DHTs and peer clusters for efficient full-text P2P indexing. / Papapetrou, Odysseas; Siberski, Wolf; Nejdl, Wolfgang.
in: Computer networks, Jahrgang 54, Nr. 12, 26.08.2010, S. 2019-2040.

Publikation: Beitrag in Fachzeitschrift › Artikel › Forschung › Peer-Review

Papapetrou, O, Siberski, W & Nejdl, W 2010, 'PCIR: Combining DHTs and peer clusters for efficient full-text P2P indexing', Computer networks, Jg. 54, Nr. 12, S. 2019-2040. https://doi.org/10.1016/j.comnet.2010.03.025

Papapetrou, O., Siberski, W., & Nejdl, W. (2010). PCIR: Combining DHTs and peer clusters for efficient full-text P2P indexing. Computer networks, 54(12), 2019-2040. https://doi.org/10.1016/j.comnet.2010.03.025

Papapetrou O, Siberski W, Nejdl W. PCIR: Combining DHTs and peer clusters for efficient full-text P2P indexing. Computer networks. 2010 Aug 26;54(12):2019-2040. doi: 10.1016/j.comnet.2010.03.025

Papapetrou, Odysseas ; Siberski, Wolf ; Nejdl, Wolfgang. / PCIR : Combining DHTs and peer clusters for efficient full-text P2P indexing. in: Computer networks. 2010 ; Jahrgang 54, Nr. 12. S. 2019-2040.

Download

@article{69c88ad835004ec09050d8427c2fc2ba,

title = "PCIR: Combining DHTs and peer clusters for efficient full-text P2P indexing",

abstract = "Distributed hash tables (DHTs) are very efficient for querying based on key lookups. However, building huge term indexes, as required for IR-style keyword search, poses a scalability challenge for plain DHTs. Due to the large sizes of document term vocabularies, peers joining the network cause huge amounts of key inserts and, consequently, a large number of index maintenance messages. Thus, the key to exploiting DHTs for distributed information retrieval is to reduce index maintenance costs. Various approaches in this direction have been pursued, including the use of hybrid infrastructures, or changing the granularity of the inverted index to peer level. We show that indexing costs can be significantly reduced further by letting peers form groups in a self-organized fashion. Instead of each individual peer submitting index information separately, all peers of a group cooperate to publish the index updates to the DHT in batches. Our evaluation shows that this approach reduces index maintenance cost by an order of magnitude, while still keeping a complete and correct term index for query processing.",

keywords = "DHT, Hybrid topologies, P2P information retrieval",

author = "Odysseas Papapetrou and Wolf Siberski and Wolfgang Nejdl",

year = "2010",

month = aug,

day = "26",

doi = "10.1016/j.comnet.2010.03.025",

language = "English",

volume = "54",

pages = "2019--2040",

journal = "Computer networks",

issn = "1389-1286",

publisher = "Elsevier",

number = "12",

}

Download

TY - JOUR

T1 - PCIR

T2 - Combining DHTs and peer clusters for efficient full-text P2P indexing

AU - Papapetrou, Odysseas

AU - Siberski, Wolf

AU - Nejdl, Wolfgang

PY - 2010/8/26

Y1 - 2010/8/26

N2 - Distributed hash tables (DHTs) are very efficient for querying based on key lookups. However, building huge term indexes, as required for IR-style keyword search, poses a scalability challenge for plain DHTs. Due to the large sizes of document term vocabularies, peers joining the network cause huge amounts of key inserts and, consequently, a large number of index maintenance messages. Thus, the key to exploiting DHTs for distributed information retrieval is to reduce index maintenance costs. Various approaches in this direction have been pursued, including the use of hybrid infrastructures, or changing the granularity of the inverted index to peer level. We show that indexing costs can be significantly reduced further by letting peers form groups in a self-organized fashion. Instead of each individual peer submitting index information separately, all peers of a group cooperate to publish the index updates to the DHT in batches. Our evaluation shows that this approach reduces index maintenance cost by an order of magnitude, while still keeping a complete and correct term index for query processing.

AB - Distributed hash tables (DHTs) are very efficient for querying based on key lookups. However, building huge term indexes, as required for IR-style keyword search, poses a scalability challenge for plain DHTs. Due to the large sizes of document term vocabularies, peers joining the network cause huge amounts of key inserts and, consequently, a large number of index maintenance messages. Thus, the key to exploiting DHTs for distributed information retrieval is to reduce index maintenance costs. Various approaches in this direction have been pursued, including the use of hybrid infrastructures, or changing the granularity of the inverted index to peer level. We show that indexing costs can be significantly reduced further by letting peers form groups in a self-organized fashion. Instead of each individual peer submitting index information separately, all peers of a group cooperate to publish the index updates to the DHT in batches. Our evaluation shows that this approach reduces index maintenance cost by an order of magnitude, while still keeping a complete and correct term index for query processing.

KW - DHT

KW - Hybrid topologies

KW - P2P information retrieval

UR - http://www.scopus.com/inward/record.url?scp=77955426847&partnerID=8YFLogxK

U2 - 10.1016/j.comnet.2010.03.025

DO - 10.1016/j.comnet.2010.03.025

M3 - Article

AN - SCOPUS:77955426847

VL - 54

SP - 2019

EP - 2040

JO - Computer networks

JF - Computer networks

SN - 1389-1286

IS - 12

ER -

Research@Leibniz University

PCIR: Combining DHTs and peer clusters for efficient full-text P2P indexing

Autoren

Organisationseinheiten

Details

Abstract

ASJC Scopus Sachgebiete

Zitieren

Von denselben Autoren

Robust Fusion of Time Series and Image Data for Improved Multimodal Clinical Prediction

Harnessing Empathy and Ethics for Relevance Detection and Information Categorization in Climate and COVID-19 Tweets

Open benchmark for filtering techniques in entity resolution

Beyond Accuracy: Investigating Error Types in GPT-4 Responses to USMLE Questions

An artificial intelligence-assisted clinical framework to facilitate diagnostics and translational discovery in hematologic neoplasia