A Multimodal Approach for Semantic Patent Image Retrieval

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Autorschaft

  • Kader Pustu-Iren
  • Gerrit Bruns
  • Ralph Ewerth

Organisationseinheiten

Externe Organisationen

  • Technische Informationsbibliothek (TIB) Leibniz-Informationszentrum Technik und Naturwissenschaften und Universitätsbibliothek
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Titel des SammelwerksPatent Text Mining and Semantic Technologies 2021
UntertitelProceedings of the 2nd Workshop on Patent Text Mining and Semantic Technologies (PatentSemTech) 2021 co-located with the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021)
Seiten45-49
Seitenumfang5
PublikationsstatusVeröffentlicht - 2021
Veranstaltung2nd Workshop on Patent Text Mining and Semantic Technologies, PatentSemTech 2021 - Virtual, Online
Dauer: 15 Juli 2021 → …

Publikationsreihe

NameCEUR Workshop Proceedings
Herausgeber (Verlag)CEUR Workshop Proceedings
Band2909
ISSN (Print)1613-0073

Abstract

Patent images such as technical drawings contain valuable information and are frequently used by experts to compare patents. However, current approaches to patent information retrieval are largely focused on textual information. Consequently, we review previous work on patent retrieval with a focus on illustrations in figures. In this paper, we report on work in progress for a novel approach for patent image retrieval that uses deep multimodal features. Scene text spotting and optical character recognition are employed to extract numerals from an image to subsequently identify references to corresponding sentences in the patent document. Furthermore, we use a neural state-of-the-art CLIP model to extract structural features from illustrations and additionally derive textual features from the related patent text using a sentence transformer model. To fuse our multimodal features for similarity search we apply re-ranking according to averaged or maximum scores. In our experiments, we compare the impact of different modalities on the task of similarity search for patent images. The experimental results suggest that patent image retrieval can be successfully performed using the proposed feature sets, while the best results are achieved when combining the features of both modalities.

ASJC Scopus Sachgebiete

Zitieren

A Multimodal Approach for Semantic Patent Image Retrieval. / Pustu-Iren, Kader; Bruns, Gerrit; Ewerth, Ralph.
Patent Text Mining and Semantic Technologies 2021: Proceedings of the 2nd Workshop on Patent Text Mining and Semantic Technologies (PatentSemTech) 2021 co-located with the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021). 2021. S. 45-49 (CEUR Workshop Proceedings; Band 2909).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Pustu-Iren, K, Bruns, G & Ewerth, R 2021, A Multimodal Approach for Semantic Patent Image Retrieval. in Patent Text Mining and Semantic Technologies 2021: Proceedings of the 2nd Workshop on Patent Text Mining and Semantic Technologies (PatentSemTech) 2021 co-located with the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021). CEUR Workshop Proceedings, Bd. 2909, S. 45-49, 2nd Workshop on Patent Text Mining and Semantic Technologies, PatentSemTech 2021, Virtual, Online, 15 Juli 2021. <https://ceur-ws.org/Vol-2909/paper6.pdf>
Pustu-Iren, K., Bruns, G., & Ewerth, R. (2021). A Multimodal Approach for Semantic Patent Image Retrieval. In Patent Text Mining and Semantic Technologies 2021: Proceedings of the 2nd Workshop on Patent Text Mining and Semantic Technologies (PatentSemTech) 2021 co-located with the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021) (S. 45-49). (CEUR Workshop Proceedings; Band 2909). https://ceur-ws.org/Vol-2909/paper6.pdf
Pustu-Iren K, Bruns G, Ewerth R. A Multimodal Approach for Semantic Patent Image Retrieval. in Patent Text Mining and Semantic Technologies 2021: Proceedings of the 2nd Workshop on Patent Text Mining and Semantic Technologies (PatentSemTech) 2021 co-located with the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021). 2021. S. 45-49. (CEUR Workshop Proceedings).
Pustu-Iren, Kader ; Bruns, Gerrit ; Ewerth, Ralph. / A Multimodal Approach for Semantic Patent Image Retrieval. Patent Text Mining and Semantic Technologies 2021: Proceedings of the 2nd Workshop on Patent Text Mining and Semantic Technologies (PatentSemTech) 2021 co-located with the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021). 2021. S. 45-49 (CEUR Workshop Proceedings).
Download
@inproceedings{76a9d6532a0e4fffa76b78a56d8056e4,
title = "A Multimodal Approach for Semantic Patent Image Retrieval",
abstract = "Patent images such as technical drawings contain valuable information and are frequently used by experts to compare patents. However, current approaches to patent information retrieval are largely focused on textual information. Consequently, we review previous work on patent retrieval with a focus on illustrations in figures. In this paper, we report on work in progress for a novel approach for patent image retrieval that uses deep multimodal features. Scene text spotting and optical character recognition are employed to extract numerals from an image to subsequently identify references to corresponding sentences in the patent document. Furthermore, we use a neural state-of-the-art CLIP model to extract structural features from illustrations and additionally derive textual features from the related patent text using a sentence transformer model. To fuse our multimodal features for similarity search we apply re-ranking according to averaged or maximum scores. In our experiments, we compare the impact of different modalities on the task of similarity search for patent images. The experimental results suggest that patent image retrieval can be successfully performed using the proposed feature sets, while the best results are achieved when combining the features of both modalities.",
keywords = "Deep learning, Mulitmodal feature representations, Patent image similarity search, Scene text spotting",
author = "Kader Pustu-Iren and Gerrit Bruns and Ralph Ewerth",
note = "Funding Information: We would like to sincerely thank the reviewers for their valuable and comprehensive comments. This work is financially supported by the Federal Ministry of Education and Research (BMBF, Bundesmin-isterium f{\"u}r Bildung und Forschung, project reference 01IO2004A). ; 2nd Workshop on Patent Text Mining and Semantic Technologies, PatentSemTech 2021 ; Conference date: 15-07-2021",
year = "2021",
language = "English",
series = "CEUR Workshop Proceedings",
publisher = "CEUR Workshop Proceedings",
pages = "45--49",
booktitle = "Patent Text Mining and Semantic Technologies 2021",

}

Download

TY - GEN

T1 - A Multimodal Approach for Semantic Patent Image Retrieval

AU - Pustu-Iren, Kader

AU - Bruns, Gerrit

AU - Ewerth, Ralph

N1 - Funding Information: We would like to sincerely thank the reviewers for their valuable and comprehensive comments. This work is financially supported by the Federal Ministry of Education and Research (BMBF, Bundesmin-isterium für Bildung und Forschung, project reference 01IO2004A).

PY - 2021

Y1 - 2021

N2 - Patent images such as technical drawings contain valuable information and are frequently used by experts to compare patents. However, current approaches to patent information retrieval are largely focused on textual information. Consequently, we review previous work on patent retrieval with a focus on illustrations in figures. In this paper, we report on work in progress for a novel approach for patent image retrieval that uses deep multimodal features. Scene text spotting and optical character recognition are employed to extract numerals from an image to subsequently identify references to corresponding sentences in the patent document. Furthermore, we use a neural state-of-the-art CLIP model to extract structural features from illustrations and additionally derive textual features from the related patent text using a sentence transformer model. To fuse our multimodal features for similarity search we apply re-ranking according to averaged or maximum scores. In our experiments, we compare the impact of different modalities on the task of similarity search for patent images. The experimental results suggest that patent image retrieval can be successfully performed using the proposed feature sets, while the best results are achieved when combining the features of both modalities.

AB - Patent images such as technical drawings contain valuable information and are frequently used by experts to compare patents. However, current approaches to patent information retrieval are largely focused on textual information. Consequently, we review previous work on patent retrieval with a focus on illustrations in figures. In this paper, we report on work in progress for a novel approach for patent image retrieval that uses deep multimodal features. Scene text spotting and optical character recognition are employed to extract numerals from an image to subsequently identify references to corresponding sentences in the patent document. Furthermore, we use a neural state-of-the-art CLIP model to extract structural features from illustrations and additionally derive textual features from the related patent text using a sentence transformer model. To fuse our multimodal features for similarity search we apply re-ranking according to averaged or maximum scores. In our experiments, we compare the impact of different modalities on the task of similarity search for patent images. The experimental results suggest that patent image retrieval can be successfully performed using the proposed feature sets, while the best results are achieved when combining the features of both modalities.

KW - Deep learning

KW - Mulitmodal feature representations

KW - Patent image similarity search

KW - Scene text spotting

UR - http://www.scopus.com/inward/record.url?scp=85111008683&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85111008683

T3 - CEUR Workshop Proceedings

SP - 45

EP - 49

BT - Patent Text Mining and Semantic Technologies 2021

T2 - 2nd Workshop on Patent Text Mining and Semantic Technologies, PatentSemTech 2021

Y2 - 15 July 2021

ER -