Visual Narratives: Large-scale Hierarchical Classification of Art-historical Images

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Autoren

  • Matthias Springstein
  • Stefanie Schneider
  • Javad Rahnama
  • Julian Stalter
  • Maximilian Kristen
  • Eric Muller-Budack
  • Ralph Ewerth

Organisationseinheiten

Externe Organisationen

  • Technische Informationsbibliothek (TIB) Leibniz-Informationszentrum Technik und Naturwissenschaften und Universitätsbibliothek
  • Ludwig-Maximilians-Universität München (LMU)
  • Reply Deutschland SE
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Titel des Sammelwerks2024 IEEE Winter Conference on Applications of Computer Vision
UntertitelWACV
Herausgeber (Verlag)Institute of Electrical and Electronics Engineers Inc.
Seiten7195-7205
Seitenumfang11
ISBN (elektronisch)9798350318920
ISBN (Print)979-8-3503-1893-7
PublikationsstatusVeröffentlicht - 2024
VeranstaltungIEEE/CVF Winter Conference on Applications of Computer Vision 2024 - Waikoloa, USA / Vereinigte Staaten
Dauer: 3 Jan. 20248 Jan. 2024

Abstract

Iconography refers to the methodical study and interpretation of thematic content in the visual arts, distinguishing it, e.g., from purely formal or aesthetic considerations. In iconographic studies, Iconclass is a widely used taxonomy that encapsulates historical, biblical, and literary themes, among others. However, given the hierarchical nature and inherent complexity of such a taxonomy, it is highly desirable to use automated methods for (Iconclass-based) image classification. Previous studies either focused narrowly on certain subsets of narratives or failed to exploit Iconclass's hierarchical structure. In this paper, we propose a novel approach for Hierarchical Multi-label Classification (HMC) of iconographic concepts in images. We present three strategies, including Language Models (LMs), for the generation of textual image descriptions using keywords extracted from Iconclass. These descriptions are utilized to pre-train a Vision-Language Model (VLM) based on a newly introduced data set of 477,569 images with more than 20,000 Iconclass concepts, far more than considered in previous studies. Furthermore, we present five approaches to multi-label classification, including a novel transformer decoder that leverages hierarchical information from the Iconclass taxonomy. Experimental results show the superiority of this approach over reasonable baselines.

ASJC Scopus Sachgebiete

Zitieren

Visual Narratives: Large-scale Hierarchical Classification of Art-historical Images. / Springstein, Matthias; Schneider, Stefanie; Rahnama, Javad et al.
2024 IEEE Winter Conference on Applications of Computer Vision: WACV . Institute of Electrical and Electronics Engineers Inc., 2024. S. 7195-7205.

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Springstein, M, Schneider, S, Rahnama, J, Stalter, J, Kristen, M, Muller-Budack, E & Ewerth, R 2024, Visual Narratives: Large-scale Hierarchical Classification of Art-historical Images. in 2024 IEEE Winter Conference on Applications of Computer Vision: WACV . Institute of Electrical and Electronics Engineers Inc., S. 7195-7205, IEEE/CVF Winter Conference on Applications of Computer Vision 2024, Waikoloa, USA / Vereinigte Staaten, 3 Jan. 2024. https://doi.org/10.1109/WACV57701.2024.00705
Springstein, M., Schneider, S., Rahnama, J., Stalter, J., Kristen, M., Muller-Budack, E., & Ewerth, R. (2024). Visual Narratives: Large-scale Hierarchical Classification of Art-historical Images. In 2024 IEEE Winter Conference on Applications of Computer Vision: WACV (S. 7195-7205). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/WACV57701.2024.00705
Springstein M, Schneider S, Rahnama J, Stalter J, Kristen M, Muller-Budack E et al. Visual Narratives: Large-scale Hierarchical Classification of Art-historical Images. in 2024 IEEE Winter Conference on Applications of Computer Vision: WACV . Institute of Electrical and Electronics Engineers Inc. 2024. S. 7195-7205 doi: 10.1109/WACV57701.2024.00705
Springstein, Matthias ; Schneider, Stefanie ; Rahnama, Javad et al. / Visual Narratives : Large-scale Hierarchical Classification of Art-historical Images. 2024 IEEE Winter Conference on Applications of Computer Vision: WACV . Institute of Electrical and Electronics Engineers Inc., 2024. S. 7195-7205
Download
@inproceedings{265dfa5d21244e939d29f92bf0829bac,
title = "Visual Narratives: Large-scale Hierarchical Classification of Art-historical Images",
abstract = "Iconography refers to the methodical study and interpretation of thematic content in the visual arts, distinguishing it, e.g., from purely formal or aesthetic considerations. In iconographic studies, Iconclass is a widely used taxonomy that encapsulates historical, biblical, and literary themes, among others. However, given the hierarchical nature and inherent complexity of such a taxonomy, it is highly desirable to use automated methods for (Iconclass-based) image classification. Previous studies either focused narrowly on certain subsets of narratives or failed to exploit Iconclass's hierarchical structure. In this paper, we propose a novel approach for Hierarchical Multi-label Classification (HMC) of iconographic concepts in images. We present three strategies, including Language Models (LMs), for the generation of textual image descriptions using keywords extracted from Iconclass. These descriptions are utilized to pre-train a Vision-Language Model (VLM) based on a newly introduced data set of 477,569 images with more than 20,000 Iconclass concepts, far more than considered in previous studies. Furthermore, we present five approaches to multi-label classification, including a novel transformer decoder that leverages hierarchical information from the Iconclass taxonomy. Experimental results show the superiority of this approach over reasonable baselines.",
keywords = "Algorithms, Applications, Arts / games / social media, Image recognition and understanding, Vision + language and/or other modalities",
author = "Matthias Springstein and Stefanie Schneider and Javad Rahnama and Julian Stalter and Maximilian Kristen and Eric Muller-Budack and Ralph Ewerth",
note = "Publisher Copyright: {\textcopyright} 2024 IEEE.; IEEE/CVF Winter Conference on Applications of Computer Vision 2024, WACV ; Conference date: 03-01-2024 Through 08-01-2024",
year = "2024",
doi = "10.1109/WACV57701.2024.00705",
language = "English",
isbn = "979-8-3503-1893-7",
pages = "7195--7205",
booktitle = "2024 IEEE Winter Conference on Applications of Computer Vision",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

Download

TY - GEN

T1 - Visual Narratives

T2 - IEEE/CVF Winter Conference on Applications of Computer Vision 2024

AU - Springstein, Matthias

AU - Schneider, Stefanie

AU - Rahnama, Javad

AU - Stalter, Julian

AU - Kristen, Maximilian

AU - Muller-Budack, Eric

AU - Ewerth, Ralph

N1 - Publisher Copyright: © 2024 IEEE.

PY - 2024

Y1 - 2024

N2 - Iconography refers to the methodical study and interpretation of thematic content in the visual arts, distinguishing it, e.g., from purely formal or aesthetic considerations. In iconographic studies, Iconclass is a widely used taxonomy that encapsulates historical, biblical, and literary themes, among others. However, given the hierarchical nature and inherent complexity of such a taxonomy, it is highly desirable to use automated methods for (Iconclass-based) image classification. Previous studies either focused narrowly on certain subsets of narratives or failed to exploit Iconclass's hierarchical structure. In this paper, we propose a novel approach for Hierarchical Multi-label Classification (HMC) of iconographic concepts in images. We present three strategies, including Language Models (LMs), for the generation of textual image descriptions using keywords extracted from Iconclass. These descriptions are utilized to pre-train a Vision-Language Model (VLM) based on a newly introduced data set of 477,569 images with more than 20,000 Iconclass concepts, far more than considered in previous studies. Furthermore, we present five approaches to multi-label classification, including a novel transformer decoder that leverages hierarchical information from the Iconclass taxonomy. Experimental results show the superiority of this approach over reasonable baselines.

AB - Iconography refers to the methodical study and interpretation of thematic content in the visual arts, distinguishing it, e.g., from purely formal or aesthetic considerations. In iconographic studies, Iconclass is a widely used taxonomy that encapsulates historical, biblical, and literary themes, among others. However, given the hierarchical nature and inherent complexity of such a taxonomy, it is highly desirable to use automated methods for (Iconclass-based) image classification. Previous studies either focused narrowly on certain subsets of narratives or failed to exploit Iconclass's hierarchical structure. In this paper, we propose a novel approach for Hierarchical Multi-label Classification (HMC) of iconographic concepts in images. We present three strategies, including Language Models (LMs), for the generation of textual image descriptions using keywords extracted from Iconclass. These descriptions are utilized to pre-train a Vision-Language Model (VLM) based on a newly introduced data set of 477,569 images with more than 20,000 Iconclass concepts, far more than considered in previous studies. Furthermore, we present five approaches to multi-label classification, including a novel transformer decoder that leverages hierarchical information from the Iconclass taxonomy. Experimental results show the superiority of this approach over reasonable baselines.

KW - Algorithms

KW - Applications

KW - Arts / games / social media

KW - Image recognition and understanding

KW - Vision + language and/or other modalities

UR - http://www.scopus.com/inward/record.url?scp=85192002577&partnerID=8YFLogxK

U2 - 10.1109/WACV57701.2024.00705

DO - 10.1109/WACV57701.2024.00705

M3 - Conference contribution

AN - SCOPUS:85192002577

SN - 979-8-3503-1893-7

SP - 7195

EP - 7205

BT - 2024 IEEE Winter Conference on Applications of Computer Vision

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 3 January 2024 through 8 January 2024

ER -