Siamese coding network and pair similarity prediction for near-duplicate image detection

Research output: Contribution to journalArticleResearchpeer review

Research Organisations

View graph of relations

Details

Original languageEnglish
Pages (from-to)159-170
Number of pages12
JournalInternational Journal of Multimedia Information Retrieval
Volume11
Issue number2
Early online date12 Apr 2022
Publication statusPublished - Jun 2022

Abstract

Near-duplicate detection in a dataset involves finding the elements that are closest to a new query element according to a given similarity function and proximity threshold. The brute force approach is very computationally intensive as it evaluates the similarity between the queried item and all items in the dataset. The potential application domain is an image sharing website that checks for plagiarism or piracy every time a new image is uploaded. Among the various approaches, near-duplicate detection was effectively addressed by SimPair LSH (Fisichella et al., in Decker, Lhotská, Link, Spies, Wagner (eds) Database and expert systems applications, Springer, 2014). As the name suggests, SimPair LSH uses locality sensitive hashing (LSH) and computes and stores in advance a small set of near-duplicate pairs present in the dataset and uses them to reduce the candidate set returned for a given query using the Triangle inequality. We develop an algorithm that predicts how the candidate set will be reduced. We also develop a new efficient method for near-duplicate image detection using a deep Siamese coding neural network that is able to extract effective features from images useful for building LSH indices. Extensive experiments on two benchmark datasets confirm the effectiveness of our deep Siamese coding network and prediction algorithm.

Keywords

    Deep features extraction, High-dimensional datasets, Indexing methods, Locality sensitive hashing, Near-duplicate image detection

ASJC Scopus subject areas

Cite this

Siamese coding network and pair similarity prediction for near-duplicate image detection. / Fisichella, Marco.
In: International Journal of Multimedia Information Retrieval, Vol. 11, No. 2, 06.2022, p. 159-170.

Research output: Contribution to journalArticleResearchpeer review

Fisichella, M 2022, 'Siamese coding network and pair similarity prediction for near-duplicate image detection', International Journal of Multimedia Information Retrieval, vol. 11, no. 2, pp. 159-170. https://doi.org/10.1007/s13735-022-00233-w
Fisichella M. Siamese coding network and pair similarity prediction for near-duplicate image detection. International Journal of Multimedia Information Retrieval. 2022 Jun;11(2):159-170. Epub 2022 Apr 12. doi: 10.1007/s13735-022-00233-w
Fisichella, Marco. / Siamese coding network and pair similarity prediction for near-duplicate image detection. In: International Journal of Multimedia Information Retrieval. 2022 ; Vol. 11, No. 2. pp. 159-170.
Download
@article{bc6678b9c49b40469dcfc2b45d685889,
title = "Siamese coding network and pair similarity prediction for near-duplicate image detection",
abstract = "Near-duplicate detection in a dataset involves finding the elements that are closest to a new query element according to a given similarity function and proximity threshold. The brute force approach is very computationally intensive as it evaluates the similarity between the queried item and all items in the dataset. The potential application domain is an image sharing website that checks for plagiarism or piracy every time a new image is uploaded. Among the various approaches, near-duplicate detection was effectively addressed by SimPair LSH (Fisichella et al., in Decker, Lhotsk{\'a}, Link, Spies, Wagner (eds) Database and expert systems applications, Springer, 2014). As the name suggests, SimPair LSH uses locality sensitive hashing (LSH) and computes and stores in advance a small set of near-duplicate pairs present in the dataset and uses them to reduce the candidate set returned for a given query using the Triangle inequality. We develop an algorithm that predicts how the candidate set will be reduced. We also develop a new efficient method for near-duplicate image detection using a deep Siamese coding neural network that is able to extract effective features from images useful for building LSH indices. Extensive experiments on two benchmark datasets confirm the effectiveness of our deep Siamese coding network and prediction algorithm.",
keywords = "Deep features extraction, High-dimensional datasets, Indexing methods, Locality sensitive hashing, Near-duplicate image detection",
author = "Marco Fisichella",
note = "Funding Information: This work was partly funded by the SoMeCliCS project under the Volkswagen Stiftung und Nieders{\"a}chsisches Ministerium f{\"u}r Wissenschaft und Kultur. ",
year = "2022",
month = jun,
doi = "10.1007/s13735-022-00233-w",
language = "English",
volume = "11",
pages = "159--170",
number = "2",

}

Download

TY - JOUR

T1 - Siamese coding network and pair similarity prediction for near-duplicate image detection

AU - Fisichella, Marco

N1 - Funding Information: This work was partly funded by the SoMeCliCS project under the Volkswagen Stiftung und Niedersächsisches Ministerium für Wissenschaft und Kultur.

PY - 2022/6

Y1 - 2022/6

N2 - Near-duplicate detection in a dataset involves finding the elements that are closest to a new query element according to a given similarity function and proximity threshold. The brute force approach is very computationally intensive as it evaluates the similarity between the queried item and all items in the dataset. The potential application domain is an image sharing website that checks for plagiarism or piracy every time a new image is uploaded. Among the various approaches, near-duplicate detection was effectively addressed by SimPair LSH (Fisichella et al., in Decker, Lhotská, Link, Spies, Wagner (eds) Database and expert systems applications, Springer, 2014). As the name suggests, SimPair LSH uses locality sensitive hashing (LSH) and computes and stores in advance a small set of near-duplicate pairs present in the dataset and uses them to reduce the candidate set returned for a given query using the Triangle inequality. We develop an algorithm that predicts how the candidate set will be reduced. We also develop a new efficient method for near-duplicate image detection using a deep Siamese coding neural network that is able to extract effective features from images useful for building LSH indices. Extensive experiments on two benchmark datasets confirm the effectiveness of our deep Siamese coding network and prediction algorithm.

AB - Near-duplicate detection in a dataset involves finding the elements that are closest to a new query element according to a given similarity function and proximity threshold. The brute force approach is very computationally intensive as it evaluates the similarity between the queried item and all items in the dataset. The potential application domain is an image sharing website that checks for plagiarism or piracy every time a new image is uploaded. Among the various approaches, near-duplicate detection was effectively addressed by SimPair LSH (Fisichella et al., in Decker, Lhotská, Link, Spies, Wagner (eds) Database and expert systems applications, Springer, 2014). As the name suggests, SimPair LSH uses locality sensitive hashing (LSH) and computes and stores in advance a small set of near-duplicate pairs present in the dataset and uses them to reduce the candidate set returned for a given query using the Triangle inequality. We develop an algorithm that predicts how the candidate set will be reduced. We also develop a new efficient method for near-duplicate image detection using a deep Siamese coding neural network that is able to extract effective features from images useful for building LSH indices. Extensive experiments on two benchmark datasets confirm the effectiveness of our deep Siamese coding network and prediction algorithm.

KW - Deep features extraction

KW - High-dimensional datasets

KW - Indexing methods

KW - Locality sensitive hashing

KW - Near-duplicate image detection

UR - http://www.scopus.com/inward/record.url?scp=85128067967&partnerID=8YFLogxK

U2 - 10.1007/s13735-022-00233-w

DO - 10.1007/s13735-022-00233-w

M3 - Article

AN - SCOPUS:85128067967

VL - 11

SP - 159

EP - 170

JO - International Journal of Multimedia Information Retrieval

JF - International Journal of Multimedia Information Retrieval

SN - 2192-6611

IS - 2

ER -

By the same author(s)