Siamese coding network and pair similarity prediction for near-duplicate image detection

Marco Fisichella

doi:10.1007/s13735-022-00233-w

Details

Original language	English
Pages (from-to)	159-170
Number of pages	12
Journal	International Journal of Multimedia Information Retrieval
Volume	11
Issue number	2
Early online date	12 Apr 2022
Publication status	Published - Jun 2022

Abstract

Near-duplicate detection in a dataset involves finding the elements that are closest to a new query element according to a given similarity function and proximity threshold. The brute force approach is very computationally intensive as it evaluates the similarity between the queried item and all items in the dataset. The potential application domain is an image sharing website that checks for plagiarism or piracy every time a new image is uploaded. Among the various approaches, near-duplicate detection was effectively addressed by SimPair LSH (Fisichella et al., in Decker, Lhotská, Link, Spies, Wagner (eds) Database and expert systems applications, Springer, 2014). As the name suggests, SimPair LSH uses locality sensitive hashing (LSH) and computes and stores in advance a small set of near-duplicate pairs present in the dataset and uses them to reduce the candidate set returned for a given query using the Triangle inequality. We develop an algorithm that predicts how the candidate set will be reduced. We also develop a new efficient method for near-duplicate image detection using a deep Siamese coding neural network that is able to extract effective features from images useful for building LSH indices. Extensive experiments on two benchmark datasets confirm the effectiveness of our deep Siamese coding network and prediction algorithm.

Keywords

Deep features extraction, High-dimensional datasets, Indexing methods, Locality sensitive hashing, Near-duplicate image detection

ASJC Scopus subject areas

Computer Science(all)
Information Systems
Engineering(all)
Media Technology
Social Sciences(all)
Library and Information Sciences

Cite this

Siamese coding network and pair similarity prediction for near-duplicate image detection. / Fisichella, Marco.
In: International Journal of Multimedia Information Retrieval, Vol. 11, No. 2, 06.2022, p. 159-170.

Research output: Contribution to journal › Article › Research › peer review

Fisichella, M 2022, 'Siamese coding network and pair similarity prediction for near-duplicate image detection', International Journal of Multimedia Information Retrieval, vol. 11, no. 2, pp. 159-170. https://doi.org/10.1007/s13735-022-00233-w

Fisichella, M. (2022). Siamese coding network and pair similarity prediction for near-duplicate image detection. International Journal of Multimedia Information Retrieval, 11(2), 159-170. https://doi.org/10.1007/s13735-022-00233-w

Fisichella M. Siamese coding network and pair similarity prediction for near-duplicate image detection. International Journal of Multimedia Information Retrieval. 2022 Jun;11(2):159-170. Epub 2022 Apr 12. doi: 10.1007/s13735-022-00233-w

Fisichella, Marco. / Siamese coding network and pair similarity prediction for near-duplicate image detection. In: International Journal of Multimedia Information Retrieval. 2022 ; Vol. 11, No. 2. pp. 159-170.

Download

@article{bc6678b9c49b40469dcfc2b45d685889,

title = "Siamese coding network and pair similarity prediction for near-duplicate image detection",

abstract = "Near-duplicate detection in a dataset involves finding the elements that are closest to a new query element according to a given similarity function and proximity threshold. The brute force approach is very computationally intensive as it evaluates the similarity between the queried item and all items in the dataset. The potential application domain is an image sharing website that checks for plagiarism or piracy every time a new image is uploaded. Among the various approaches, near-duplicate detection was effectively addressed by SimPair LSH (Fisichella et al., in Decker, Lhotsk{\'a}, Link, Spies, Wagner (eds) Database and expert systems applications, Springer, 2014). As the name suggests, SimPair LSH uses locality sensitive hashing (LSH) and computes and stores in advance a small set of near-duplicate pairs present in the dataset and uses them to reduce the candidate set returned for a given query using the Triangle inequality. We develop an algorithm that predicts how the candidate set will be reduced. We also develop a new efficient method for near-duplicate image detection using a deep Siamese coding neural network that is able to extract effective features from images useful for building LSH indices. Extensive experiments on two benchmark datasets confirm the effectiveness of our deep Siamese coding network and prediction algorithm.",

keywords = "Deep features extraction, High-dimensional datasets, Indexing methods, Locality sensitive hashing, Near-duplicate image detection",

author = "Marco Fisichella",

note = "Funding Information: This work was partly funded by the SoMeCliCS project under the Volkswagen Stiftung und Nieders{\"a}chsisches Ministerium f{\"u}r Wissenschaft und Kultur. ",

year = "2022",

month = jun,

doi = "10.1007/s13735-022-00233-w",

language = "English",

volume = "11",

pages = "159--170",

number = "2",

}

Download

TY - JOUR

T1 - Siamese coding network and pair similarity prediction for near-duplicate image detection

AU - Fisichella, Marco

N1 - Funding Information: This work was partly funded by the SoMeCliCS project under the Volkswagen Stiftung und Niedersächsisches Ministerium für Wissenschaft und Kultur.

PY - 2022/6

Y1 - 2022/6

N2 - Near-duplicate detection in a dataset involves finding the elements that are closest to a new query element according to a given similarity function and proximity threshold. The brute force approach is very computationally intensive as it evaluates the similarity between the queried item and all items in the dataset. The potential application domain is an image sharing website that checks for plagiarism or piracy every time a new image is uploaded. Among the various approaches, near-duplicate detection was effectively addressed by SimPair LSH (Fisichella et al., in Decker, Lhotská, Link, Spies, Wagner (eds) Database and expert systems applications, Springer, 2014). As the name suggests, SimPair LSH uses locality sensitive hashing (LSH) and computes and stores in advance a small set of near-duplicate pairs present in the dataset and uses them to reduce the candidate set returned for a given query using the Triangle inequality. We develop an algorithm that predicts how the candidate set will be reduced. We also develop a new efficient method for near-duplicate image detection using a deep Siamese coding neural network that is able to extract effective features from images useful for building LSH indices. Extensive experiments on two benchmark datasets confirm the effectiveness of our deep Siamese coding network and prediction algorithm.

AB - Near-duplicate detection in a dataset involves finding the elements that are closest to a new query element according to a given similarity function and proximity threshold. The brute force approach is very computationally intensive as it evaluates the similarity between the queried item and all items in the dataset. The potential application domain is an image sharing website that checks for plagiarism or piracy every time a new image is uploaded. Among the various approaches, near-duplicate detection was effectively addressed by SimPair LSH (Fisichella et al., in Decker, Lhotská, Link, Spies, Wagner (eds) Database and expert systems applications, Springer, 2014). As the name suggests, SimPair LSH uses locality sensitive hashing (LSH) and computes and stores in advance a small set of near-duplicate pairs present in the dataset and uses them to reduce the candidate set returned for a given query using the Triangle inequality. We develop an algorithm that predicts how the candidate set will be reduced. We also develop a new efficient method for near-duplicate image detection using a deep Siamese coding neural network that is able to extract effective features from images useful for building LSH indices. Extensive experiments on two benchmark datasets confirm the effectiveness of our deep Siamese coding network and prediction algorithm.

KW - Deep features extraction

KW - High-dimensional datasets

KW - Indexing methods

KW - Locality sensitive hashing

KW - Near-duplicate image detection

UR - http://www.scopus.com/inward/record.url?scp=85128067967&partnerID=8YFLogxK

U2 - 10.1007/s13735-022-00233-w

DO - 10.1007/s13735-022-00233-w

M3 - Article

AN - SCOPUS:85128067967

VL - 11

SP - 159

EP - 170

JO - International Journal of Multimedia Information Retrieval

JF - International Journal of Multimedia Information Retrieval

SN - 2192-6611

IS - 2

ER -

Research@Leibniz University

Siamese coding network and pair similarity prediction for near-duplicate image detection

Authors

Research Organisations

Details

Abstract

Keywords

ASJC Scopus subject areas

Cite this

By the same author(s)

Harnessing Empathy and Ethics for Relevance Detection and Information Categorization in Climate and COVID-19 Tweets

A Trustworthy Approach to Classify and Analyze Epidemic-Related Information From Microblogs

LaMMOn: language model combined graph neural network for multi-target multi-camera tracking in online scenarios

Open benchmark for filtering techniques in entity resolution

Does a language model “understand” high school math? A survey of deep learning based word problem solvers