Optimizing Near Duplicate Detection for P2P Networks

Odysseas Papapetrou; Sukriti Ramesh; Stefan Siersdorfer; Wolfgang Nejdl

doi:10.1109/P2P.2010.5570001

Details

Original language	English
Title of host publication	2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings
Publication status	Published - 22 Nov 2010
Event	2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Delft, Netherlands Duration: 25 Aug 2010 → 27 Aug 2010

Publication series

Name	2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings

Abstract

In this paper, we propose a probabilistic algorithm for detecting near duplicate text, audio, and video resources efficiently and effectively in large-scale P2P systems. To this end, we present a thorough cost and probabilistic analysis that allows the algorithm to adapt to network and data collection characteristics for minimizing network cost. In addition, we extend the algorithm so that it can identify similar videos, even if some of the videos are split into different files. A thorough theoretical analysis as well as a large-scale experimental evaluation on networks of up to 100,000 peers using real-world datasets of more than 200 Gbytes demonstrate the viability of our approach.

ASJC Scopus subject areas

Computer Science(all)
Computational Theory and Mathematics
Computer Science(all)
Computer Science Applications
Mathematics(all)
Theoretical Computer Science

Cite this

Optimizing Near Duplicate Detection for P2P Networks. / Papapetrou, Odysseas; Ramesh, Sukriti; Siersdorfer, Stefan et al.
2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings. 2010. 5570001 (2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings).

Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review

Papapetrou, O, Ramesh, S, Siersdorfer, S & Nejdl, W 2010, Optimizing Near Duplicate Detection for P2P Networks. in 2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings., 5570001, 2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings, 2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010, Delft, Netherlands, 25 Aug 2010. https://doi.org/10.1109/P2P.2010.5570001

Papapetrou, O., Ramesh, S., Siersdorfer, S., & Nejdl, W. (2010). Optimizing Near Duplicate Detection for P2P Networks. In 2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings Article 5570001 (2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings). https://doi.org/10.1109/P2P.2010.5570001

Papapetrou O, Ramesh S, Siersdorfer S, Nejdl W. Optimizing Near Duplicate Detection for P2P Networks. In 2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings. 2010. 5570001. (2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings). doi: 10.1109/P2P.2010.5570001

Papapetrou, Odysseas ; Ramesh, Sukriti ; Siersdorfer, Stefan et al. / Optimizing Near Duplicate Detection for P2P Networks. 2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings. 2010. (2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings).

Download

@inproceedings{e1dc42e13c9a48d587a78c5504ed586b,

title = "Optimizing Near Duplicate Detection for P2P Networks",

abstract = "In this paper, we propose a probabilistic algorithm for detecting near duplicate text, audio, and video resources efficiently and effectively in large-scale P2P systems. To this end, we present a thorough cost and probabilistic analysis that allows the algorithm to adapt to network and data collection characteristics for minimizing network cost. In addition, we extend the algorithm so that it can identify similar videos, even if some of the videos are split into different files. A thorough theoretical analysis as well as a large-scale experimental evaluation on networks of up to 100,000 peers using real-world datasets of more than 200 Gbytes demonstrate the viability of our approach.",

author = "Odysseas Papapetrou and Sukriti Ramesh and Stefan Siersdorfer and Wolfgang Nejdl",

year = "2010",

month = nov,

day = "22",

doi = "10.1109/P2P.2010.5570001",

language = "English",

isbn = "9781424471416",

series = "2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings",

booktitle = "2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings",

note = "2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 ; Conference date: 25-08-2010 Through 27-08-2010",

}

Download

TY - GEN

T1 - Optimizing Near Duplicate Detection for P2P Networks

AU - Papapetrou, Odysseas

AU - Ramesh, Sukriti

AU - Siersdorfer, Stefan

AU - Nejdl, Wolfgang

PY - 2010/11/22

Y1 - 2010/11/22

N2 - In this paper, we propose a probabilistic algorithm for detecting near duplicate text, audio, and video resources efficiently and effectively in large-scale P2P systems. To this end, we present a thorough cost and probabilistic analysis that allows the algorithm to adapt to network and data collection characteristics for minimizing network cost. In addition, we extend the algorithm so that it can identify similar videos, even if some of the videos are split into different files. A thorough theoretical analysis as well as a large-scale experimental evaluation on networks of up to 100,000 peers using real-world datasets of more than 200 Gbytes demonstrate the viability of our approach.

AB - In this paper, we propose a probabilistic algorithm for detecting near duplicate text, audio, and video resources efficiently and effectively in large-scale P2P systems. To this end, we present a thorough cost and probabilistic analysis that allows the algorithm to adapt to network and data collection characteristics for minimizing network cost. In addition, we extend the algorithm so that it can identify similar videos, even if some of the videos are split into different files. A thorough theoretical analysis as well as a large-scale experimental evaluation on networks of up to 100,000 peers using real-world datasets of more than 200 Gbytes demonstrate the viability of our approach.

UR - http://www.scopus.com/inward/record.url?scp=78349238180&partnerID=8YFLogxK

U2 - 10.1109/P2P.2010.5570001

DO - 10.1109/P2P.2010.5570001

M3 - Conference contribution

AN - SCOPUS:78349238180

SN - 9781424471416

T3 - 2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings

BT - 2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings

T2 - 2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010

Y2 - 25 August 2010 through 27 August 2010

ER -

Research@Leibniz University

Optimizing Near Duplicate Detection for P2P Networks

Authors

Research Organisations

Details

Publication series

Abstract

ASJC Scopus subject areas

Cite this

By the same author(s)

Adaptive Dispatching of Mobile Charging Stations using Multi-Agent Graph Convolutional Cooperative-Competitive Reinforcement Learning

Robust Fusion of Time Series and Image Data for Improved Multimodal Clinical Prediction

Harnessing Empathy and Ethics for Relevance Detection and Information Categorization in Climate and COVID-19 Tweets

Open benchmark for filtering techniques in entity resolution

Beyond Accuracy: Investigating Error Types in GPT-4 Responses to USMLE Questions