Details
Originalsprache | Englisch |
---|---|
Titel des Sammelwerks | 2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings |
Publikationsstatus | Veröffentlicht - 22 Nov. 2010 |
Veranstaltung | 2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Delft, Niederlande Dauer: 25 Aug. 2010 → 27 Aug. 2010 |
Publikationsreihe
Name | 2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings |
---|
Abstract
In this paper, we propose a probabilistic algorithm for detecting near duplicate text, audio, and video resources efficiently and effectively in large-scale P2P systems. To this end, we present a thorough cost and probabilistic analysis that allows the algorithm to adapt to network and data collection characteristics for minimizing network cost. In addition, we extend the algorithm so that it can identify similar videos, even if some of the videos are split into different files. A thorough theoretical analysis as well as a large-scale experimental evaluation on networks of up to 100,000 peers using real-world datasets of more than 200 Gbytes demonstrate the viability of our approach.
ASJC Scopus Sachgebiete
- Informatik (insg.)
- Theoretische Informatik und Mathematik
- Informatik (insg.)
- Angewandte Informatik
- Mathematik (insg.)
- Theoretische Informatik
Zitieren
- Standard
- Harvard
- Apa
- Vancouver
- BibTex
- RIS
2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings. 2010. 5570001 (2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings).
Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review
}
TY - GEN
T1 - Optimizing Near Duplicate Detection for P2P Networks
AU - Papapetrou, Odysseas
AU - Ramesh, Sukriti
AU - Siersdorfer, Stefan
AU - Nejdl, Wolfgang
PY - 2010/11/22
Y1 - 2010/11/22
N2 - In this paper, we propose a probabilistic algorithm for detecting near duplicate text, audio, and video resources efficiently and effectively in large-scale P2P systems. To this end, we present a thorough cost and probabilistic analysis that allows the algorithm to adapt to network and data collection characteristics for minimizing network cost. In addition, we extend the algorithm so that it can identify similar videos, even if some of the videos are split into different files. A thorough theoretical analysis as well as a large-scale experimental evaluation on networks of up to 100,000 peers using real-world datasets of more than 200 Gbytes demonstrate the viability of our approach.
AB - In this paper, we propose a probabilistic algorithm for detecting near duplicate text, audio, and video resources efficiently and effectively in large-scale P2P systems. To this end, we present a thorough cost and probabilistic analysis that allows the algorithm to adapt to network and data collection characteristics for minimizing network cost. In addition, we extend the algorithm so that it can identify similar videos, even if some of the videos are split into different files. A thorough theoretical analysis as well as a large-scale experimental evaluation on networks of up to 100,000 peers using real-world datasets of more than 200 Gbytes demonstrate the viability of our approach.
UR - http://www.scopus.com/inward/record.url?scp=78349238180&partnerID=8YFLogxK
U2 - 10.1109/P2P.2010.5570001
DO - 10.1109/P2P.2010.5570001
M3 - Conference contribution
AN - SCOPUS:78349238180
SN - 9781424471416
T3 - 2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings
BT - 2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010 - Proceedings
T2 - 2010 IEEE 10th International Conference on Peer-to-Peer Computing, P2P 2010
Y2 - 25 August 2010 through 27 August 2010
ER -