Siamese coding network and pair similarity prediction for near-duplicate image detection

Marco Fisichella

doi:10.1007/s13735-022-00233-w

Details

Originalsprache	Englisch
Seiten (von - bis)	159-170
Seitenumfang	12
Fachzeitschrift	International Journal of Multimedia Information Retrieval
Jahrgang	11
Ausgabenummer	2
Frühes Online-Datum	12 Apr. 2022
Publikationsstatus	Veröffentlicht - Juni 2022

Abstract

Near-duplicate detection in a dataset involves finding the elements that are closest to a new query element according to a given similarity function and proximity threshold. The brute force approach is very computationally intensive as it evaluates the similarity between the queried item and all items in the dataset. The potential application domain is an image sharing website that checks for plagiarism or piracy every time a new image is uploaded. Among the various approaches, near-duplicate detection was effectively addressed by SimPair LSH (Fisichella et al., in Decker, Lhotská, Link, Spies, Wagner (eds) Database and expert systems applications, Springer, 2014). As the name suggests, SimPair LSH uses locality sensitive hashing (LSH) and computes and stores in advance a small set of near-duplicate pairs present in the dataset and uses them to reduce the candidate set returned for a given query using the Triangle inequality. We develop an algorithm that predicts how the candidate set will be reduced. We also develop a new efficient method for near-duplicate image detection using a deep Siamese coding neural network that is able to extract effective features from images useful for building LSH indices. Extensive experiments on two benchmark datasets confirm the effectiveness of our deep Siamese coding network and prediction algorithm.

ASJC Scopus Sachgebiete

Informatik (insg.)
Information systems
Ingenieurwesen (insg.)
Medientechnik
Sozialwissenschaften (insg.)
Bibliotheks- und Informationswissenschaften

Zitieren

Siamese coding network and pair similarity prediction for near-duplicate image detection. / Fisichella, Marco.
in: International Journal of Multimedia Information Retrieval, Jahrgang 11, Nr. 2, 06.2022, S. 159-170.

Publikation: Beitrag in Fachzeitschrift › Artikel › Forschung › Peer-Review

Fisichella, M 2022, 'Siamese coding network and pair similarity prediction for near-duplicate image detection', International Journal of Multimedia Information Retrieval, Jg. 11, Nr. 2, S. 159-170. https://doi.org/10.1007/s13735-022-00233-w

Fisichella, M. (2022). Siamese coding network and pair similarity prediction for near-duplicate image detection. International Journal of Multimedia Information Retrieval, 11(2), 159-170. https://doi.org/10.1007/s13735-022-00233-w

Fisichella M. Siamese coding network and pair similarity prediction for near-duplicate image detection. International Journal of Multimedia Information Retrieval. 2022 Jun;11(2):159-170. Epub 2022 Apr 12. doi: 10.1007/s13735-022-00233-w

Fisichella, Marco. / Siamese coding network and pair similarity prediction for near-duplicate image detection. in: International Journal of Multimedia Information Retrieval. 2022 ; Jahrgang 11, Nr. 2. S. 159-170.

Download

@article{bc6678b9c49b40469dcfc2b45d685889,

title = "Siamese coding network and pair similarity prediction for near-duplicate image detection",

abstract = "Near-duplicate detection in a dataset involves finding the elements that are closest to a new query element according to a given similarity function and proximity threshold. The brute force approach is very computationally intensive as it evaluates the similarity between the queried item and all items in the dataset. The potential application domain is an image sharing website that checks for plagiarism or piracy every time a new image is uploaded. Among the various approaches, near-duplicate detection was effectively addressed by SimPair LSH (Fisichella et al., in Decker, Lhotsk{\'a}, Link, Spies, Wagner (eds) Database and expert systems applications, Springer, 2014). As the name suggests, SimPair LSH uses locality sensitive hashing (LSH) and computes and stores in advance a small set of near-duplicate pairs present in the dataset and uses them to reduce the candidate set returned for a given query using the Triangle inequality. We develop an algorithm that predicts how the candidate set will be reduced. We also develop a new efficient method for near-duplicate image detection using a deep Siamese coding neural network that is able to extract effective features from images useful for building LSH indices. Extensive experiments on two benchmark datasets confirm the effectiveness of our deep Siamese coding network and prediction algorithm.",

keywords = "Deep features extraction, High-dimensional datasets, Indexing methods, Locality sensitive hashing, Near-duplicate image detection",

author = "Marco Fisichella",

note = "Funding Information: This work was partly funded by the SoMeCliCS project under the Volkswagen Stiftung und Nieders{\"a}chsisches Ministerium f{\"u}r Wissenschaft und Kultur. ",

year = "2022",

month = jun,

doi = "10.1007/s13735-022-00233-w",

language = "English",

volume = "11",

pages = "159--170",

number = "2",

}

Download

TY - JOUR

T1 - Siamese coding network and pair similarity prediction for near-duplicate image detection

AU - Fisichella, Marco

N1 - Funding Information: This work was partly funded by the SoMeCliCS project under the Volkswagen Stiftung und Niedersächsisches Ministerium für Wissenschaft und Kultur.

PY - 2022/6

Y1 - 2022/6

N2 - Near-duplicate detection in a dataset involves finding the elements that are closest to a new query element according to a given similarity function and proximity threshold. The brute force approach is very computationally intensive as it evaluates the similarity between the queried item and all items in the dataset. The potential application domain is an image sharing website that checks for plagiarism or piracy every time a new image is uploaded. Among the various approaches, near-duplicate detection was effectively addressed by SimPair LSH (Fisichella et al., in Decker, Lhotská, Link, Spies, Wagner (eds) Database and expert systems applications, Springer, 2014). As the name suggests, SimPair LSH uses locality sensitive hashing (LSH) and computes and stores in advance a small set of near-duplicate pairs present in the dataset and uses them to reduce the candidate set returned for a given query using the Triangle inequality. We develop an algorithm that predicts how the candidate set will be reduced. We also develop a new efficient method for near-duplicate image detection using a deep Siamese coding neural network that is able to extract effective features from images useful for building LSH indices. Extensive experiments on two benchmark datasets confirm the effectiveness of our deep Siamese coding network and prediction algorithm.

AB - Near-duplicate detection in a dataset involves finding the elements that are closest to a new query element according to a given similarity function and proximity threshold. The brute force approach is very computationally intensive as it evaluates the similarity between the queried item and all items in the dataset. The potential application domain is an image sharing website that checks for plagiarism or piracy every time a new image is uploaded. Among the various approaches, near-duplicate detection was effectively addressed by SimPair LSH (Fisichella et al., in Decker, Lhotská, Link, Spies, Wagner (eds) Database and expert systems applications, Springer, 2014). As the name suggests, SimPair LSH uses locality sensitive hashing (LSH) and computes and stores in advance a small set of near-duplicate pairs present in the dataset and uses them to reduce the candidate set returned for a given query using the Triangle inequality. We develop an algorithm that predicts how the candidate set will be reduced. We also develop a new efficient method for near-duplicate image detection using a deep Siamese coding neural network that is able to extract effective features from images useful for building LSH indices. Extensive experiments on two benchmark datasets confirm the effectiveness of our deep Siamese coding network and prediction algorithm.

KW - Deep features extraction

KW - High-dimensional datasets

KW - Indexing methods

KW - Locality sensitive hashing

KW - Near-duplicate image detection

UR - http://www.scopus.com/inward/record.url?scp=85128067967&partnerID=8YFLogxK

U2 - 10.1007/s13735-022-00233-w

DO - 10.1007/s13735-022-00233-w

M3 - Article

AN - SCOPUS:85128067967

VL - 11

SP - 159

EP - 170

JO - International Journal of Multimedia Information Retrieval

JF - International Journal of Multimedia Information Retrieval

SN - 2192-6611

IS - 2

ER -

Research@Leibniz University

Siamese coding network and pair similarity prediction for near-duplicate image detection

Autoren

Organisationseinheiten

Details

Abstract

ASJC Scopus Sachgebiete

Zitieren

Von denselben Autoren

A Trustworthy Approach to Classify and Analyze Epidemic-Related Information From Microblogs

LaMMOn: language model combined graph neural network for multi-target multi-camera tracking in online scenarios

Open benchmark for filtering techniques in entity resolution

Does a language model “understand” high school math? A survey of deep learning based word problem solvers

FairTrade: Achieving Pareto-Optimal Trade-Offs between Balanced Accuracy and Fairness in Federated Learning