Optimizing Near Duplicate Detection for P2P Networks

Odysseas Papapetrou; Sukriti Ramesh; Stefan Siersdorfer; Wolfgang Nejdl

doi:10.1109/P2P.2010.5570001

Details

Originalsprache	Englisch
Titel des Sammelwerks	2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings
Publikationsstatus	Veröffentlicht - 22 Nov. 2010
Veranstaltung	2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Delft, Niederlande Dauer: 25 Aug. 2010 → 27 Aug. 2010

Publikationsreihe

Name	2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings

Abstract

In this paper, we propose a probabilistic algorithm for detecting near duplicate text, audio, and video resources efficiently and effectively in large-scale P2P systems. To this end, we present a thorough cost and probabilistic analysis that allows the algorithm to adapt to network and data collection characteristics for minimizing network cost. In addition, we extend the algorithm so that it can identify similar videos, even if some of the videos are split into different files. A thorough theoretical analysis as well as a large-scale experimental evaluation on networks of up to 100,000 peers using real-world datasets of more than 200 Gbytes demonstrate the viability of our approach.

ASJC Scopus Sachgebiete

Informatik (insg.)
Theoretische Informatik und Mathematik
Informatik (insg.)
Angewandte Informatik
Mathematik (insg.)
Theoretische Informatik

Zitieren

Optimizing Near Duplicate Detection for P2P Networks. / Papapetrou, Odysseas; Ramesh, Sukriti; Siersdorfer, Stefan et al.
2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings. 2010. 5570001 (2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review

Papapetrou, O, Ramesh, S, Siersdorfer, S & Nejdl, W 2010, Optimizing Near Duplicate Detection for P2P Networks. in 2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings., 5570001, 2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings, 2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010, Delft, Niederlande, 25 Aug. 2010. https://doi.org/10.1109/P2P.2010.5570001

Papapetrou, O., Ramesh, S., Siersdorfer, S., & Nejdl, W. (2010). Optimizing Near Duplicate Detection for P2P Networks. In 2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings Artikel 5570001 (2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings). https://doi.org/10.1109/P2P.2010.5570001

Papapetrou O, Ramesh S, Siersdorfer S, Nejdl W. Optimizing Near Duplicate Detection for P2P Networks. in 2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings. 2010. 5570001. (2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings). doi: 10.1109/P2P.2010.5570001

Papapetrou, Odysseas ; Ramesh, Sukriti ; Siersdorfer, Stefan et al. / Optimizing Near Duplicate Detection for P2P Networks. 2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings. 2010. (2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings).

Download

@inproceedings{e1dc42e13c9a48d587a78c5504ed586b,

title = "Optimizing Near Duplicate Detection for P2P Networks",

abstract = "In this paper, we propose a probabilistic algorithm for detecting near duplicate text, audio, and video resources efficiently and effectively in large-scale P2P systems. To this end, we present a thorough cost and probabilistic analysis that allows the algorithm to adapt to network and data collection characteristics for minimizing network cost. In addition, we extend the algorithm so that it can identify similar videos, even if some of the videos are split into different files. A thorough theoretical analysis as well as a large-scale experimental evaluation on networks of up to 100,000 peers using real-world datasets of more than 200 Gbytes demonstrate the viability of our approach.",

author = "Odysseas Papapetrou and Sukriti Ramesh and Stefan Siersdorfer and Wolfgang Nejdl",

year = "2010",

month = nov,

day = "22",

doi = "10.1109/P2P.2010.5570001",

language = "English",

isbn = "9781424471416",

series = "2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings",

booktitle = "2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings",

note = "2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 ; Conference date: 25-08-2010 Through 27-08-2010",

}

Download

TY - GEN

T1 - Optimizing Near Duplicate Detection for P2P Networks

AU - Papapetrou, Odysseas

AU - Ramesh, Sukriti

AU - Siersdorfer, Stefan

AU - Nejdl, Wolfgang

PY - 2010/11/22

Y1 - 2010/11/22

N2 - In this paper, we propose a probabilistic algorithm for detecting near duplicate text, audio, and video resources efficiently and effectively in large-scale P2P systems. To this end, we present a thorough cost and probabilistic analysis that allows the algorithm to adapt to network and data collection characteristics for minimizing network cost. In addition, we extend the algorithm so that it can identify similar videos, even if some of the videos are split into different files. A thorough theoretical analysis as well as a large-scale experimental evaluation on networks of up to 100,000 peers using real-world datasets of more than 200 Gbytes demonstrate the viability of our approach.

AB - In this paper, we propose a probabilistic algorithm for detecting near duplicate text, audio, and video resources efficiently and effectively in large-scale P2P systems. To this end, we present a thorough cost and probabilistic analysis that allows the algorithm to adapt to network and data collection characteristics for minimizing network cost. In addition, we extend the algorithm so that it can identify similar videos, even if some of the videos are split into different files. A thorough theoretical analysis as well as a large-scale experimental evaluation on networks of up to 100,000 peers using real-world datasets of more than 200 Gbytes demonstrate the viability of our approach.

UR - http://www.scopus.com/inward/record.url?scp=78349238180&partnerID=8YFLogxK

U2 - 10.1109/P2P.2010.5570001

DO - 10.1109/P2P.2010.5570001

M3 - Conference contribution

AN - SCOPUS:78349238180

SN - 9781424471416

T3 - 2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings

BT - 2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings

T2 - 2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010

Y2 - 25 August 2010 through 27 August 2010

ER -

Research@Leibniz University

Optimizing Near Duplicate Detection for P2P Networks

Autoren

Organisationseinheiten

Details

Publikationsreihe

Abstract

ASJC Scopus Sachgebiete

Zitieren

Von denselben Autoren

Robust Fusion of Time Series and Image Data for Improved Multimodal Clinical Prediction

Harnessing Empathy and Ethics for Relevance Detection and Information Categorization in Climate and COVID-19 Tweets

Open benchmark for filtering techniques in entity resolution

Beyond Accuracy: Investigating Error Types in GPT-4 Responses to USMLE Questions

An artificial intelligence-assisted clinical framework to facilitate diagnostics and translational discovery in hematologic neoplasia