Details
Original language | English |
---|---|
Title of host publication | 2024 IEEE Winter Conference on Applications of Computer Vision |
Subtitle of host publication | WACV |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 7195-7205 |
Number of pages | 11 |
ISBN (electronic) | 9798350318920 |
ISBN (print) | 979-8-3503-1893-7 |
Publication status | Published - 2024 |
Event | IEEE/CVF Winter Conference on Applications of Computer Vision 2024 - Waikoloa, United States Duration: 3 Jan 2024 → 8 Jan 2024 |
Abstract
Iconography refers to the methodical study and interpretation of thematic content in the visual arts, distinguishing it, e.g., from purely formal or aesthetic considerations. In iconographic studies, Iconclass is a widely used taxonomy that encapsulates historical, biblical, and literary themes, among others. However, given the hierarchical nature and inherent complexity of such a taxonomy, it is highly desirable to use automated methods for (Iconclass-based) image classification. Previous studies either focused narrowly on certain subsets of narratives or failed to exploit Iconclass's hierarchical structure. In this paper, we propose a novel approach for Hierarchical Multi-label Classification (HMC) of iconographic concepts in images. We present three strategies, including Language Models (LMs), for the generation of textual image descriptions using keywords extracted from Iconclass. These descriptions are utilized to pre-train a Vision-Language Model (VLM) based on a newly introduced data set of 477,569 images with more than 20,000 Iconclass concepts, far more than considered in previous studies. Furthermore, we present five approaches to multi-label classification, including a novel transformer decoder that leverages hierarchical information from the Iconclass taxonomy. Experimental results show the superiority of this approach over reasonable baselines.
Keywords
- Algorithms, Applications, Arts / games / social media, Image recognition and understanding, Vision + language and/or other modalities
ASJC Scopus subject areas
- Computer Science(all)
- Artificial Intelligence
- Computer Science(all)
- Computer Science Applications
- Computer Science(all)
- Computer Vision and Pattern Recognition
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
2024 IEEE Winter Conference on Applications of Computer Vision: WACV . Institute of Electrical and Electronics Engineers Inc., 2024. p. 7195-7205.
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - Visual Narratives
T2 - IEEE/CVF Winter Conference on Applications of Computer Vision 2024
AU - Springstein, Matthias
AU - Schneider, Stefanie
AU - Rahnama, Javad
AU - Stalter, Julian
AU - Kristen, Maximilian
AU - Muller-Budack, Eric
AU - Ewerth, Ralph
N1 - Publisher Copyright: © 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Iconography refers to the methodical study and interpretation of thematic content in the visual arts, distinguishing it, e.g., from purely formal or aesthetic considerations. In iconographic studies, Iconclass is a widely used taxonomy that encapsulates historical, biblical, and literary themes, among others. However, given the hierarchical nature and inherent complexity of such a taxonomy, it is highly desirable to use automated methods for (Iconclass-based) image classification. Previous studies either focused narrowly on certain subsets of narratives or failed to exploit Iconclass's hierarchical structure. In this paper, we propose a novel approach for Hierarchical Multi-label Classification (HMC) of iconographic concepts in images. We present three strategies, including Language Models (LMs), for the generation of textual image descriptions using keywords extracted from Iconclass. These descriptions are utilized to pre-train a Vision-Language Model (VLM) based on a newly introduced data set of 477,569 images with more than 20,000 Iconclass concepts, far more than considered in previous studies. Furthermore, we present five approaches to multi-label classification, including a novel transformer decoder that leverages hierarchical information from the Iconclass taxonomy. Experimental results show the superiority of this approach over reasonable baselines.
AB - Iconography refers to the methodical study and interpretation of thematic content in the visual arts, distinguishing it, e.g., from purely formal or aesthetic considerations. In iconographic studies, Iconclass is a widely used taxonomy that encapsulates historical, biblical, and literary themes, among others. However, given the hierarchical nature and inherent complexity of such a taxonomy, it is highly desirable to use automated methods for (Iconclass-based) image classification. Previous studies either focused narrowly on certain subsets of narratives or failed to exploit Iconclass's hierarchical structure. In this paper, we propose a novel approach for Hierarchical Multi-label Classification (HMC) of iconographic concepts in images. We present three strategies, including Language Models (LMs), for the generation of textual image descriptions using keywords extracted from Iconclass. These descriptions are utilized to pre-train a Vision-Language Model (VLM) based on a newly introduced data set of 477,569 images with more than 20,000 Iconclass concepts, far more than considered in previous studies. Furthermore, we present five approaches to multi-label classification, including a novel transformer decoder that leverages hierarchical information from the Iconclass taxonomy. Experimental results show the superiority of this approach over reasonable baselines.
KW - Algorithms
KW - Applications
KW - Arts / games / social media
KW - Image recognition and understanding
KW - Vision + language and/or other modalities
UR - http://www.scopus.com/inward/record.url?scp=85192002577&partnerID=8YFLogxK
U2 - 10.1109/WACV57701.2024.00705
DO - 10.1109/WACV57701.2024.00705
M3 - Conference contribution
AN - SCOPUS:85192002577
SN - 979-8-3503-1893-7
SP - 7195
EP - 7205
BT - 2024 IEEE Winter Conference on Applications of Computer Vision
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 3 January 2024 through 8 January 2024
ER -