Visual Narratives: Large-scale Hierarchical Classification of Art-historical Images

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

  • Matthias Springstein
  • Stefanie Schneider
  • Javad Rahnama
  • Julian Stalter
  • Maximilian Kristen
  • Eric Muller-Budack
  • Ralph Ewerth

Research Organisations

External Research Organisations

  • German National Library of Science and Technology (TIB)
  • Ludwig-Maximilians-Universität München (LMU)
  • Reply Deutschland SE
View graph of relations

Details

Original languageEnglish
Title of host publication2024 IEEE Winter Conference on Applications of Computer Vision
Subtitle of host publicationWACV
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages7195-7205
Number of pages11
ISBN (electronic)9798350318920
ISBN (print)979-8-3503-1893-7
Publication statusPublished - 2024
EventIEEE/CVF Winter Conference on Applications of Computer Vision 2024 - Waikoloa, United States
Duration: 3 Jan 20248 Jan 2024

Abstract

Iconography refers to the methodical study and interpretation of thematic content in the visual arts, distinguishing it, e.g., from purely formal or aesthetic considerations. In iconographic studies, Iconclass is a widely used taxonomy that encapsulates historical, biblical, and literary themes, among others. However, given the hierarchical nature and inherent complexity of such a taxonomy, it is highly desirable to use automated methods for (Iconclass-based) image classification. Previous studies either focused narrowly on certain subsets of narratives or failed to exploit Iconclass's hierarchical structure. In this paper, we propose a novel approach for Hierarchical Multi-label Classification (HMC) of iconographic concepts in images. We present three strategies, including Language Models (LMs), for the generation of textual image descriptions using keywords extracted from Iconclass. These descriptions are utilized to pre-train a Vision-Language Model (VLM) based on a newly introduced data set of 477,569 images with more than 20,000 Iconclass concepts, far more than considered in previous studies. Furthermore, we present five approaches to multi-label classification, including a novel transformer decoder that leverages hierarchical information from the Iconclass taxonomy. Experimental results show the superiority of this approach over reasonable baselines.

Keywords

    Algorithms, Applications, Arts / games / social media, Image recognition and understanding, Vision + language and/or other modalities

ASJC Scopus subject areas

Cite this

Visual Narratives: Large-scale Hierarchical Classification of Art-historical Images. / Springstein, Matthias; Schneider, Stefanie; Rahnama, Javad et al.
2024 IEEE Winter Conference on Applications of Computer Vision: WACV . Institute of Electrical and Electronics Engineers Inc., 2024. p. 7195-7205.

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Springstein, M, Schneider, S, Rahnama, J, Stalter, J, Kristen, M, Muller-Budack, E & Ewerth, R 2024, Visual Narratives: Large-scale Hierarchical Classification of Art-historical Images. in 2024 IEEE Winter Conference on Applications of Computer Vision: WACV . Institute of Electrical and Electronics Engineers Inc., pp. 7195-7205, IEEE/CVF Winter Conference on Applications of Computer Vision 2024, Waikoloa, United States, 3 Jan 2024. https://doi.org/10.1109/WACV57701.2024.00705
Springstein, M., Schneider, S., Rahnama, J., Stalter, J., Kristen, M., Muller-Budack, E., & Ewerth, R. (2024). Visual Narratives: Large-scale Hierarchical Classification of Art-historical Images. In 2024 IEEE Winter Conference on Applications of Computer Vision: WACV (pp. 7195-7205). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/WACV57701.2024.00705
Springstein M, Schneider S, Rahnama J, Stalter J, Kristen M, Muller-Budack E et al. Visual Narratives: Large-scale Hierarchical Classification of Art-historical Images. In 2024 IEEE Winter Conference on Applications of Computer Vision: WACV . Institute of Electrical and Electronics Engineers Inc. 2024. p. 7195-7205 doi: 10.1109/WACV57701.2024.00705
Springstein, Matthias ; Schneider, Stefanie ; Rahnama, Javad et al. / Visual Narratives : Large-scale Hierarchical Classification of Art-historical Images. 2024 IEEE Winter Conference on Applications of Computer Vision: WACV . Institute of Electrical and Electronics Engineers Inc., 2024. pp. 7195-7205
Download
@inproceedings{265dfa5d21244e939d29f92bf0829bac,
title = "Visual Narratives: Large-scale Hierarchical Classification of Art-historical Images",
abstract = "Iconography refers to the methodical study and interpretation of thematic content in the visual arts, distinguishing it, e.g., from purely formal or aesthetic considerations. In iconographic studies, Iconclass is a widely used taxonomy that encapsulates historical, biblical, and literary themes, among others. However, given the hierarchical nature and inherent complexity of such a taxonomy, it is highly desirable to use automated methods for (Iconclass-based) image classification. Previous studies either focused narrowly on certain subsets of narratives or failed to exploit Iconclass's hierarchical structure. In this paper, we propose a novel approach for Hierarchical Multi-label Classification (HMC) of iconographic concepts in images. We present three strategies, including Language Models (LMs), for the generation of textual image descriptions using keywords extracted from Iconclass. These descriptions are utilized to pre-train a Vision-Language Model (VLM) based on a newly introduced data set of 477,569 images with more than 20,000 Iconclass concepts, far more than considered in previous studies. Furthermore, we present five approaches to multi-label classification, including a novel transformer decoder that leverages hierarchical information from the Iconclass taxonomy. Experimental results show the superiority of this approach over reasonable baselines.",
keywords = "Algorithms, Applications, Arts / games / social media, Image recognition and understanding, Vision + language and/or other modalities",
author = "Matthias Springstein and Stefanie Schneider and Javad Rahnama and Julian Stalter and Maximilian Kristen and Eric Muller-Budack and Ralph Ewerth",
note = "Publisher Copyright: {\textcopyright} 2024 IEEE.; IEEE/CVF Winter Conference on Applications of Computer Vision 2024, WACV ; Conference date: 03-01-2024 Through 08-01-2024",
year = "2024",
doi = "10.1109/WACV57701.2024.00705",
language = "English",
isbn = "979-8-3503-1893-7",
pages = "7195--7205",
booktitle = "2024 IEEE Winter Conference on Applications of Computer Vision",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

Download

TY - GEN

T1 - Visual Narratives

T2 - IEEE/CVF Winter Conference on Applications of Computer Vision 2024

AU - Springstein, Matthias

AU - Schneider, Stefanie

AU - Rahnama, Javad

AU - Stalter, Julian

AU - Kristen, Maximilian

AU - Muller-Budack, Eric

AU - Ewerth, Ralph

N1 - Publisher Copyright: © 2024 IEEE.

PY - 2024

Y1 - 2024

N2 - Iconography refers to the methodical study and interpretation of thematic content in the visual arts, distinguishing it, e.g., from purely formal or aesthetic considerations. In iconographic studies, Iconclass is a widely used taxonomy that encapsulates historical, biblical, and literary themes, among others. However, given the hierarchical nature and inherent complexity of such a taxonomy, it is highly desirable to use automated methods for (Iconclass-based) image classification. Previous studies either focused narrowly on certain subsets of narratives or failed to exploit Iconclass's hierarchical structure. In this paper, we propose a novel approach for Hierarchical Multi-label Classification (HMC) of iconographic concepts in images. We present three strategies, including Language Models (LMs), for the generation of textual image descriptions using keywords extracted from Iconclass. These descriptions are utilized to pre-train a Vision-Language Model (VLM) based on a newly introduced data set of 477,569 images with more than 20,000 Iconclass concepts, far more than considered in previous studies. Furthermore, we present five approaches to multi-label classification, including a novel transformer decoder that leverages hierarchical information from the Iconclass taxonomy. Experimental results show the superiority of this approach over reasonable baselines.

AB - Iconography refers to the methodical study and interpretation of thematic content in the visual arts, distinguishing it, e.g., from purely formal or aesthetic considerations. In iconographic studies, Iconclass is a widely used taxonomy that encapsulates historical, biblical, and literary themes, among others. However, given the hierarchical nature and inherent complexity of such a taxonomy, it is highly desirable to use automated methods for (Iconclass-based) image classification. Previous studies either focused narrowly on certain subsets of narratives or failed to exploit Iconclass's hierarchical structure. In this paper, we propose a novel approach for Hierarchical Multi-label Classification (HMC) of iconographic concepts in images. We present three strategies, including Language Models (LMs), for the generation of textual image descriptions using keywords extracted from Iconclass. These descriptions are utilized to pre-train a Vision-Language Model (VLM) based on a newly introduced data set of 477,569 images with more than 20,000 Iconclass concepts, far more than considered in previous studies. Furthermore, we present five approaches to multi-label classification, including a novel transformer decoder that leverages hierarchical information from the Iconclass taxonomy. Experimental results show the superiority of this approach over reasonable baselines.

KW - Algorithms

KW - Applications

KW - Arts / games / social media

KW - Image recognition and understanding

KW - Vision + language and/or other modalities

UR - http://www.scopus.com/inward/record.url?scp=85192002577&partnerID=8YFLogxK

U2 - 10.1109/WACV57701.2024.00705

DO - 10.1109/WACV57701.2024.00705

M3 - Conference contribution

AN - SCOPUS:85192002577

SN - 979-8-3503-1893-7

SP - 7195

EP - 7205

BT - 2024 IEEE Winter Conference on Applications of Computer Vision

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 3 January 2024 through 8 January 2024

ER -