Siamese coding network and pair similarity prediction for near-duplicate image detection

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Organisationseinheiten

Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Seiten (von - bis)159-170
Seitenumfang12
FachzeitschriftInternational Journal of Multimedia Information Retrieval
Jahrgang11
Ausgabenummer2
Frühes Online-Datum12 Apr. 2022
PublikationsstatusVeröffentlicht - Juni 2022

Abstract

Near-duplicate detection in a dataset involves finding the elements that are closest to a new query element according to a given similarity function and proximity threshold. The brute force approach is very computationally intensive as it evaluates the similarity between the queried item and all items in the dataset. The potential application domain is an image sharing website that checks for plagiarism or piracy every time a new image is uploaded. Among the various approaches, near-duplicate detection was effectively addressed by SimPair LSH (Fisichella et al., in Decker, Lhotská, Link, Spies, Wagner (eds) Database and expert systems applications, Springer, 2014). As the name suggests, SimPair LSH uses locality sensitive hashing (LSH) and computes and stores in advance a small set of near-duplicate pairs present in the dataset and uses them to reduce the candidate set returned for a given query using the Triangle inequality. We develop an algorithm that predicts how the candidate set will be reduced. We also develop a new efficient method for near-duplicate image detection using a deep Siamese coding neural network that is able to extract effective features from images useful for building LSH indices. Extensive experiments on two benchmark datasets confirm the effectiveness of our deep Siamese coding network and prediction algorithm.

ASJC Scopus Sachgebiete

Zitieren

Siamese coding network and pair similarity prediction for near-duplicate image detection. / Fisichella, Marco.
in: International Journal of Multimedia Information Retrieval, Jahrgang 11, Nr. 2, 06.2022, S. 159-170.

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Fisichella, M 2022, 'Siamese coding network and pair similarity prediction for near-duplicate image detection', International Journal of Multimedia Information Retrieval, Jg. 11, Nr. 2, S. 159-170. https://doi.org/10.1007/s13735-022-00233-w
Fisichella M. Siamese coding network and pair similarity prediction for near-duplicate image detection. International Journal of Multimedia Information Retrieval. 2022 Jun;11(2):159-170. Epub 2022 Apr 12. doi: 10.1007/s13735-022-00233-w
Fisichella, Marco. / Siamese coding network and pair similarity prediction for near-duplicate image detection. in: International Journal of Multimedia Information Retrieval. 2022 ; Jahrgang 11, Nr. 2. S. 159-170.
Download
@article{bc6678b9c49b40469dcfc2b45d685889,
title = "Siamese coding network and pair similarity prediction for near-duplicate image detection",
abstract = "Near-duplicate detection in a dataset involves finding the elements that are closest to a new query element according to a given similarity function and proximity threshold. The brute force approach is very computationally intensive as it evaluates the similarity between the queried item and all items in the dataset. The potential application domain is an image sharing website that checks for plagiarism or piracy every time a new image is uploaded. Among the various approaches, near-duplicate detection was effectively addressed by SimPair LSH (Fisichella et al., in Decker, Lhotsk{\'a}, Link, Spies, Wagner (eds) Database and expert systems applications, Springer, 2014). As the name suggests, SimPair LSH uses locality sensitive hashing (LSH) and computes and stores in advance a small set of near-duplicate pairs present in the dataset and uses them to reduce the candidate set returned for a given query using the Triangle inequality. We develop an algorithm that predicts how the candidate set will be reduced. We also develop a new efficient method for near-duplicate image detection using a deep Siamese coding neural network that is able to extract effective features from images useful for building LSH indices. Extensive experiments on two benchmark datasets confirm the effectiveness of our deep Siamese coding network and prediction algorithm.",
keywords = "Deep features extraction, High-dimensional datasets, Indexing methods, Locality sensitive hashing, Near-duplicate image detection",
author = "Marco Fisichella",
note = "Funding Information: This work was partly funded by the SoMeCliCS project under the Volkswagen Stiftung und Nieders{\"a}chsisches Ministerium f{\"u}r Wissenschaft und Kultur. ",
year = "2022",
month = jun,
doi = "10.1007/s13735-022-00233-w",
language = "English",
volume = "11",
pages = "159--170",
number = "2",

}

Download

TY - JOUR

T1 - Siamese coding network and pair similarity prediction for near-duplicate image detection

AU - Fisichella, Marco

N1 - Funding Information: This work was partly funded by the SoMeCliCS project under the Volkswagen Stiftung und Niedersächsisches Ministerium für Wissenschaft und Kultur.

PY - 2022/6

Y1 - 2022/6

N2 - Near-duplicate detection in a dataset involves finding the elements that are closest to a new query element according to a given similarity function and proximity threshold. The brute force approach is very computationally intensive as it evaluates the similarity between the queried item and all items in the dataset. The potential application domain is an image sharing website that checks for plagiarism or piracy every time a new image is uploaded. Among the various approaches, near-duplicate detection was effectively addressed by SimPair LSH (Fisichella et al., in Decker, Lhotská, Link, Spies, Wagner (eds) Database and expert systems applications, Springer, 2014). As the name suggests, SimPair LSH uses locality sensitive hashing (LSH) and computes and stores in advance a small set of near-duplicate pairs present in the dataset and uses them to reduce the candidate set returned for a given query using the Triangle inequality. We develop an algorithm that predicts how the candidate set will be reduced. We also develop a new efficient method for near-duplicate image detection using a deep Siamese coding neural network that is able to extract effective features from images useful for building LSH indices. Extensive experiments on two benchmark datasets confirm the effectiveness of our deep Siamese coding network and prediction algorithm.

AB - Near-duplicate detection in a dataset involves finding the elements that are closest to a new query element according to a given similarity function and proximity threshold. The brute force approach is very computationally intensive as it evaluates the similarity between the queried item and all items in the dataset. The potential application domain is an image sharing website that checks for plagiarism or piracy every time a new image is uploaded. Among the various approaches, near-duplicate detection was effectively addressed by SimPair LSH (Fisichella et al., in Decker, Lhotská, Link, Spies, Wagner (eds) Database and expert systems applications, Springer, 2014). As the name suggests, SimPair LSH uses locality sensitive hashing (LSH) and computes and stores in advance a small set of near-duplicate pairs present in the dataset and uses them to reduce the candidate set returned for a given query using the Triangle inequality. We develop an algorithm that predicts how the candidate set will be reduced. We also develop a new efficient method for near-duplicate image detection using a deep Siamese coding neural network that is able to extract effective features from images useful for building LSH indices. Extensive experiments on two benchmark datasets confirm the effectiveness of our deep Siamese coding network and prediction algorithm.

KW - Deep features extraction

KW - High-dimensional datasets

KW - Indexing methods

KW - Locality sensitive hashing

KW - Near-duplicate image detection

UR - http://www.scopus.com/inward/record.url?scp=85128067967&partnerID=8YFLogxK

U2 - 10.1007/s13735-022-00233-w

DO - 10.1007/s13735-022-00233-w

M3 - Article

AN - SCOPUS:85128067967

VL - 11

SP - 159

EP - 170

JO - International Journal of Multimedia Information Retrieval

JF - International Journal of Multimedia Information Retrieval

SN - 2192-6611

IS - 2

ER -

Von denselben Autoren