Details
Originalsprache | Englisch |
---|---|
Seiten (von - bis) | 159-170 |
Seitenumfang | 12 |
Fachzeitschrift | International Journal of Multimedia Information Retrieval |
Jahrgang | 11 |
Ausgabenummer | 2 |
Frühes Online-Datum | 12 Apr. 2022 |
Publikationsstatus | Veröffentlicht - Juni 2022 |
Abstract
Near-duplicate detection in a dataset involves finding the elements that are closest to a new query element according to a given similarity function and proximity threshold. The brute force approach is very computationally intensive as it evaluates the similarity between the queried item and all items in the dataset. The potential application domain is an image sharing website that checks for plagiarism or piracy every time a new image is uploaded. Among the various approaches, near-duplicate detection was effectively addressed by SimPair LSH (Fisichella et al., in Decker, Lhotská, Link, Spies, Wagner (eds) Database and expert systems applications, Springer, 2014). As the name suggests, SimPair LSH uses locality sensitive hashing (LSH) and computes and stores in advance a small set of near-duplicate pairs present in the dataset and uses them to reduce the candidate set returned for a given query using the Triangle inequality. We develop an algorithm that predicts how the candidate set will be reduced. We also develop a new efficient method for near-duplicate image detection using a deep Siamese coding neural network that is able to extract effective features from images useful for building LSH indices. Extensive experiments on two benchmark datasets confirm the effectiveness of our deep Siamese coding network and prediction algorithm.
ASJC Scopus Sachgebiete
- Informatik (insg.)
- Information systems
- Ingenieurwesen (insg.)
- Medientechnik
- Sozialwissenschaften (insg.)
- Bibliotheks- und Informationswissenschaften
Zitieren
- Standard
- Harvard
- Apa
- Vancouver
- BibTex
- RIS
in: International Journal of Multimedia Information Retrieval, Jahrgang 11, Nr. 2, 06.2022, S. 159-170.
Publikation: Beitrag in Fachzeitschrift › Artikel › Forschung › Peer-Review
}
TY - JOUR
T1 - Siamese coding network and pair similarity prediction for near-duplicate image detection
AU - Fisichella, Marco
N1 - Funding Information: This work was partly funded by the SoMeCliCS project under the Volkswagen Stiftung und Niedersächsisches Ministerium für Wissenschaft und Kultur.
PY - 2022/6
Y1 - 2022/6
N2 - Near-duplicate detection in a dataset involves finding the elements that are closest to a new query element according to a given similarity function and proximity threshold. The brute force approach is very computationally intensive as it evaluates the similarity between the queried item and all items in the dataset. The potential application domain is an image sharing website that checks for plagiarism or piracy every time a new image is uploaded. Among the various approaches, near-duplicate detection was effectively addressed by SimPair LSH (Fisichella et al., in Decker, Lhotská, Link, Spies, Wagner (eds) Database and expert systems applications, Springer, 2014). As the name suggests, SimPair LSH uses locality sensitive hashing (LSH) and computes and stores in advance a small set of near-duplicate pairs present in the dataset and uses them to reduce the candidate set returned for a given query using the Triangle inequality. We develop an algorithm that predicts how the candidate set will be reduced. We also develop a new efficient method for near-duplicate image detection using a deep Siamese coding neural network that is able to extract effective features from images useful for building LSH indices. Extensive experiments on two benchmark datasets confirm the effectiveness of our deep Siamese coding network and prediction algorithm.
AB - Near-duplicate detection in a dataset involves finding the elements that are closest to a new query element according to a given similarity function and proximity threshold. The brute force approach is very computationally intensive as it evaluates the similarity between the queried item and all items in the dataset. The potential application domain is an image sharing website that checks for plagiarism or piracy every time a new image is uploaded. Among the various approaches, near-duplicate detection was effectively addressed by SimPair LSH (Fisichella et al., in Decker, Lhotská, Link, Spies, Wagner (eds) Database and expert systems applications, Springer, 2014). As the name suggests, SimPair LSH uses locality sensitive hashing (LSH) and computes and stores in advance a small set of near-duplicate pairs present in the dataset and uses them to reduce the candidate set returned for a given query using the Triangle inequality. We develop an algorithm that predicts how the candidate set will be reduced. We also develop a new efficient method for near-duplicate image detection using a deep Siamese coding neural network that is able to extract effective features from images useful for building LSH indices. Extensive experiments on two benchmark datasets confirm the effectiveness of our deep Siamese coding network and prediction algorithm.
KW - Deep features extraction
KW - High-dimensional datasets
KW - Indexing methods
KW - Locality sensitive hashing
KW - Near-duplicate image detection
UR - http://www.scopus.com/inward/record.url?scp=85128067967&partnerID=8YFLogxK
U2 - 10.1007/s13735-022-00233-w
DO - 10.1007/s13735-022-00233-w
M3 - Article
AN - SCOPUS:85128067967
VL - 11
SP - 159
EP - 170
JO - International Journal of Multimedia Information Retrieval
JF - International Journal of Multimedia Information Retrieval
SN - 2192-6611
IS - 2
ER -