A Multimodal Approach for Semantic Patent Image Retrieval

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

  • Kader Pustu-Iren
  • Gerrit Bruns
  • Ralph Ewerth

Research Organisations

External Research Organisations

  • German National Library of Science and Technology (TIB)
View graph of relations

Details

Original languageEnglish
Title of host publicationPatent Text Mining and Semantic Technologies 2021
Subtitle of host publicationProceedings of the 2nd Workshop on Patent Text Mining and Semantic Technologies (PatentSemTech) 2021 co-located with the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021)
Pages45-49
Number of pages5
Publication statusPublished - 2021
Event2nd Workshop on Patent Text Mining and Semantic Technologies, PatentSemTech 2021 - Virtual, Online
Duration: 15 Jul 2021 → …

Publication series

NameCEUR Workshop Proceedings
PublisherCEUR Workshop Proceedings
Volume2909
ISSN (Print)1613-0073

Abstract

Patent images such as technical drawings contain valuable information and are frequently used by experts to compare patents. However, current approaches to patent information retrieval are largely focused on textual information. Consequently, we review previous work on patent retrieval with a focus on illustrations in figures. In this paper, we report on work in progress for a novel approach for patent image retrieval that uses deep multimodal features. Scene text spotting and optical character recognition are employed to extract numerals from an image to subsequently identify references to corresponding sentences in the patent document. Furthermore, we use a neural state-of-the-art CLIP model to extract structural features from illustrations and additionally derive textual features from the related patent text using a sentence transformer model. To fuse our multimodal features for similarity search we apply re-ranking according to averaged or maximum scores. In our experiments, we compare the impact of different modalities on the task of similarity search for patent images. The experimental results suggest that patent image retrieval can be successfully performed using the proposed feature sets, while the best results are achieved when combining the features of both modalities.

Keywords

    Deep learning, Mulitmodal feature representations, Patent image similarity search, Scene text spotting

ASJC Scopus subject areas

Cite this

A Multimodal Approach for Semantic Patent Image Retrieval. / Pustu-Iren, Kader; Bruns, Gerrit; Ewerth, Ralph.
Patent Text Mining and Semantic Technologies 2021: Proceedings of the 2nd Workshop on Patent Text Mining and Semantic Technologies (PatentSemTech) 2021 co-located with the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021). 2021. p. 45-49 (CEUR Workshop Proceedings; Vol. 2909).

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Pustu-Iren, K, Bruns, G & Ewerth, R 2021, A Multimodal Approach for Semantic Patent Image Retrieval. in Patent Text Mining and Semantic Technologies 2021: Proceedings of the 2nd Workshop on Patent Text Mining and Semantic Technologies (PatentSemTech) 2021 co-located with the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021). CEUR Workshop Proceedings, vol. 2909, pp. 45-49, 2nd Workshop on Patent Text Mining and Semantic Technologies, PatentSemTech 2021, Virtual, Online, 15 Jul 2021. <https://ceur-ws.org/Vol-2909/paper6.pdf>
Pustu-Iren, K., Bruns, G., & Ewerth, R. (2021). A Multimodal Approach for Semantic Patent Image Retrieval. In Patent Text Mining and Semantic Technologies 2021: Proceedings of the 2nd Workshop on Patent Text Mining and Semantic Technologies (PatentSemTech) 2021 co-located with the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021) (pp. 45-49). (CEUR Workshop Proceedings; Vol. 2909). https://ceur-ws.org/Vol-2909/paper6.pdf
Pustu-Iren K, Bruns G, Ewerth R. A Multimodal Approach for Semantic Patent Image Retrieval. In Patent Text Mining and Semantic Technologies 2021: Proceedings of the 2nd Workshop on Patent Text Mining and Semantic Technologies (PatentSemTech) 2021 co-located with the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021). 2021. p. 45-49. (CEUR Workshop Proceedings).
Pustu-Iren, Kader ; Bruns, Gerrit ; Ewerth, Ralph. / A Multimodal Approach for Semantic Patent Image Retrieval. Patent Text Mining and Semantic Technologies 2021: Proceedings of the 2nd Workshop on Patent Text Mining and Semantic Technologies (PatentSemTech) 2021 co-located with the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021). 2021. pp. 45-49 (CEUR Workshop Proceedings).
Download
@inproceedings{76a9d6532a0e4fffa76b78a56d8056e4,
title = "A Multimodal Approach for Semantic Patent Image Retrieval",
abstract = "Patent images such as technical drawings contain valuable information and are frequently used by experts to compare patents. However, current approaches to patent information retrieval are largely focused on textual information. Consequently, we review previous work on patent retrieval with a focus on illustrations in figures. In this paper, we report on work in progress for a novel approach for patent image retrieval that uses deep multimodal features. Scene text spotting and optical character recognition are employed to extract numerals from an image to subsequently identify references to corresponding sentences in the patent document. Furthermore, we use a neural state-of-the-art CLIP model to extract structural features from illustrations and additionally derive textual features from the related patent text using a sentence transformer model. To fuse our multimodal features for similarity search we apply re-ranking according to averaged or maximum scores. In our experiments, we compare the impact of different modalities on the task of similarity search for patent images. The experimental results suggest that patent image retrieval can be successfully performed using the proposed feature sets, while the best results are achieved when combining the features of both modalities.",
keywords = "Deep learning, Mulitmodal feature representations, Patent image similarity search, Scene text spotting",
author = "Kader Pustu-Iren and Gerrit Bruns and Ralph Ewerth",
note = "Funding Information: We would like to sincerely thank the reviewers for their valuable and comprehensive comments. This work is financially supported by the Federal Ministry of Education and Research (BMBF, Bundesmin-isterium f{\"u}r Bildung und Forschung, project reference 01IO2004A). ; 2nd Workshop on Patent Text Mining and Semantic Technologies, PatentSemTech 2021 ; Conference date: 15-07-2021",
year = "2021",
language = "English",
series = "CEUR Workshop Proceedings",
publisher = "CEUR Workshop Proceedings",
pages = "45--49",
booktitle = "Patent Text Mining and Semantic Technologies 2021",

}

Download

TY - GEN

T1 - A Multimodal Approach for Semantic Patent Image Retrieval

AU - Pustu-Iren, Kader

AU - Bruns, Gerrit

AU - Ewerth, Ralph

N1 - Funding Information: We would like to sincerely thank the reviewers for their valuable and comprehensive comments. This work is financially supported by the Federal Ministry of Education and Research (BMBF, Bundesmin-isterium für Bildung und Forschung, project reference 01IO2004A).

PY - 2021

Y1 - 2021

N2 - Patent images such as technical drawings contain valuable information and are frequently used by experts to compare patents. However, current approaches to patent information retrieval are largely focused on textual information. Consequently, we review previous work on patent retrieval with a focus on illustrations in figures. In this paper, we report on work in progress for a novel approach for patent image retrieval that uses deep multimodal features. Scene text spotting and optical character recognition are employed to extract numerals from an image to subsequently identify references to corresponding sentences in the patent document. Furthermore, we use a neural state-of-the-art CLIP model to extract structural features from illustrations and additionally derive textual features from the related patent text using a sentence transformer model. To fuse our multimodal features for similarity search we apply re-ranking according to averaged or maximum scores. In our experiments, we compare the impact of different modalities on the task of similarity search for patent images. The experimental results suggest that patent image retrieval can be successfully performed using the proposed feature sets, while the best results are achieved when combining the features of both modalities.

AB - Patent images such as technical drawings contain valuable information and are frequently used by experts to compare patents. However, current approaches to patent information retrieval are largely focused on textual information. Consequently, we review previous work on patent retrieval with a focus on illustrations in figures. In this paper, we report on work in progress for a novel approach for patent image retrieval that uses deep multimodal features. Scene text spotting and optical character recognition are employed to extract numerals from an image to subsequently identify references to corresponding sentences in the patent document. Furthermore, we use a neural state-of-the-art CLIP model to extract structural features from illustrations and additionally derive textual features from the related patent text using a sentence transformer model. To fuse our multimodal features for similarity search we apply re-ranking according to averaged or maximum scores. In our experiments, we compare the impact of different modalities on the task of similarity search for patent images. The experimental results suggest that patent image retrieval can be successfully performed using the proposed feature sets, while the best results are achieved when combining the features of both modalities.

KW - Deep learning

KW - Mulitmodal feature representations

KW - Patent image similarity search

KW - Scene text spotting

UR - http://www.scopus.com/inward/record.url?scp=85111008683&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85111008683

T3 - CEUR Workshop Proceedings

SP - 45

EP - 49

BT - Patent Text Mining and Semantic Technologies 2021

T2 - 2nd Workshop on Patent Text Mining and Semantic Technologies, PatentSemTech 2021

Y2 - 15 July 2021

ER -