Optimizing Near Duplicate Detection for P2P Networks

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

Research Organisations

View graph of relations

Details

Original languageEnglish
Title of host publication2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings
Publication statusPublished - 22 Nov 2010
Event2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Delft, Netherlands
Duration: 25 Aug 201027 Aug 2010

Publication series

Name2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings

Abstract

In this paper, we propose a probabilistic algorithm for detecting near duplicate text, audio, and video resources efficiently and effectively in large-scale P2P systems. To this end, we present a thorough cost and probabilistic analysis that allows the algorithm to adapt to network and data collection characteristics for minimizing network cost. In addition, we extend the algorithm so that it can identify similar videos, even if some of the videos are split into different files. A thorough theoretical analysis as well as a large-scale experimental evaluation on networks of up to 100,000 peers using real-world datasets of more than 200 Gbytes demonstrate the viability of our approach.

ASJC Scopus subject areas

Cite this

Optimizing Near Duplicate Detection for P2P Networks. / Papapetrou, Odysseas; Ramesh, Sukriti; Siersdorfer, Stefan et al.
2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings. 2010. 5570001 (2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings).

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Papapetrou, O, Ramesh, S, Siersdorfer, S & Nejdl, W 2010, Optimizing Near Duplicate Detection for P2P Networks. in 2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings., 5570001, 2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings, 2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010, Delft, Netherlands, 25 Aug 2010. https://doi.org/10.1109/P2P.2010.5570001
Papapetrou, O., Ramesh, S., Siersdorfer, S., & Nejdl, W. (2010). Optimizing Near Duplicate Detection for P2P Networks. In 2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings Article 5570001 (2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings). https://doi.org/10.1109/P2P.2010.5570001
Papapetrou O, Ramesh S, Siersdorfer S, Nejdl W. Optimizing Near Duplicate Detection for P2P Networks. In 2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings. 2010. 5570001. (2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings). doi: 10.1109/P2P.2010.5570001
Papapetrou, Odysseas ; Ramesh, Sukriti ; Siersdorfer, Stefan et al. / Optimizing Near Duplicate Detection for P2P Networks. 2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings. 2010. (2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings).
Download
@inproceedings{e1dc42e13c9a48d587a78c5504ed586b,
title = "Optimizing Near Duplicate Detection for P2P Networks",
abstract = "In this paper, we propose a probabilistic algorithm for detecting near duplicate text, audio, and video resources efficiently and effectively in large-scale P2P systems. To this end, we present a thorough cost and probabilistic analysis that allows the algorithm to adapt to network and data collection characteristics for minimizing network cost. In addition, we extend the algorithm so that it can identify similar videos, even if some of the videos are split into different files. A thorough theoretical analysis as well as a large-scale experimental evaluation on networks of up to 100,000 peers using real-world datasets of more than 200 Gbytes demonstrate the viability of our approach.",
author = "Odysseas Papapetrou and Sukriti Ramesh and Stefan Siersdorfer and Wolfgang Nejdl",
year = "2010",
month = nov,
day = "22",
doi = "10.1109/P2P.2010.5570001",
language = "English",
isbn = "9781424471416",
series = "2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings",
booktitle = "2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings",
note = "2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 ; Conference date: 25-08-2010 Through 27-08-2010",

}

Download

TY - GEN

T1 - Optimizing Near Duplicate Detection for P2P Networks

AU - Papapetrou, Odysseas

AU - Ramesh, Sukriti

AU - Siersdorfer, Stefan

AU - Nejdl, Wolfgang

PY - 2010/11/22

Y1 - 2010/11/22

N2 - In this paper, we propose a probabilistic algorithm for detecting near duplicate text, audio, and video resources efficiently and effectively in large-scale P2P systems. To this end, we present a thorough cost and probabilistic analysis that allows the algorithm to adapt to network and data collection characteristics for minimizing network cost. In addition, we extend the algorithm so that it can identify similar videos, even if some of the videos are split into different files. A thorough theoretical analysis as well as a large-scale experimental evaluation on networks of up to 100,000 peers using real-world datasets of more than 200 Gbytes demonstrate the viability of our approach.

AB - In this paper, we propose a probabilistic algorithm for detecting near duplicate text, audio, and video resources efficiently and effectively in large-scale P2P systems. To this end, we present a thorough cost and probabilistic analysis that allows the algorithm to adapt to network and data collection characteristics for minimizing network cost. In addition, we extend the algorithm so that it can identify similar videos, even if some of the videos are split into different files. A thorough theoretical analysis as well as a large-scale experimental evaluation on networks of up to 100,000 peers using real-world datasets of more than 200 Gbytes demonstrate the viability of our approach.

UR - http://www.scopus.com/inward/record.url?scp=78349238180&partnerID=8YFLogxK

U2 - 10.1109/P2P.2010.5570001

DO - 10.1109/P2P.2010.5570001

M3 - Conference contribution

AN - SCOPUS:78349238180

SN - 9781424471416

T3 - 2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings

BT - 2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings

T2 - 2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010

Y2 - 25 August 2010 through 27 August 2010

ER -

By the same author(s)