Details
Original language | English |
---|---|
Title of host publication | Empirical Multimodality Research |
Subtitle of host publication | Methods, Evaluations, Implications |
Editors | Jana Pflaeging, Janina Wildfeuer , John A. Bateman |
Publisher | de Gruyter |
Pages | 109-138 |
Number of pages | 30 |
ISBN (electronic) | 9783110725001 |
ISBN (print) | 9783110724912 |
Publication status | Published - 1 Jan 2021 |
Abstract
In this paper, we present approaches that automatically estimate semantic relations between textual and (pictorial) visual information.We consider the interpretation of these relations as one of the key elements for empirical research on multimodal information. From a computational perspective, it is difficult to automatically “comprehend” the meaning of multimodal information and to interpret cross-modal semantic relations. One reason is that already the automatic understanding and interpretation of a single source of information (e.g., text, image, or audio) is difficult — and it is even more difficult to model and understand the interplay of two different modalities. While the complex interplay of visual and textual information has been investigated in communication sciences and linguistics for years, they have been rarely considered from a computer science perspective. To this end, we review the few currently existing approaches to automatically recognize semantic cross-modal relations. In previous work, we have suggested to model image-text relations along three main dimensions: Cross-modal mutual information, semantic correlation, and the status relation. Using these dimensions, we characterized a set of eight image-text classes and showed their relations to existing taxonomies. Moreover, we have shown how the cross-modal mutual information can be further differentiated in order to measure image-text consistency in news at the entity level of persons, locations, and scene context. Experimental results demonstrate the feasibility of the approaches.
Keywords
- Computer vision, Deep learning, Multimodal information retrieval, Multimodal news analytics, Multimodal semiotic analysis, Semantic image-text classes
ASJC Scopus subject areas
- Arts and Humanities(all)
- General Arts and Humanities
- Social Sciences(all)
- General Social Sciences
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
Empirical Multimodality Research: Methods, Evaluations, Implications. ed. / Jana Pflaeging; Janina Wildfeuer ; John A. Bateman. de Gruyter, 2021. p. 109-138.
Research output: Chapter in book/report/conference proceeding › Contribution to book/anthology › Research › peer review
}
TY - CHAP
T1 - Computational Approaches for the Interpretation of Image-Text Relations
AU - Ewerth, Ralph
AU - Otto, Christian
AU - Müller-Budack, Eric
PY - 2021/1/1
Y1 - 2021/1/1
N2 - In this paper, we present approaches that automatically estimate semantic relations between textual and (pictorial) visual information.We consider the interpretation of these relations as one of the key elements for empirical research on multimodal information. From a computational perspective, it is difficult to automatically “comprehend” the meaning of multimodal information and to interpret cross-modal semantic relations. One reason is that already the automatic understanding and interpretation of a single source of information (e.g., text, image, or audio) is difficult — and it is even more difficult to model and understand the interplay of two different modalities. While the complex interplay of visual and textual information has been investigated in communication sciences and linguistics for years, they have been rarely considered from a computer science perspective. To this end, we review the few currently existing approaches to automatically recognize semantic cross-modal relations. In previous work, we have suggested to model image-text relations along three main dimensions: Cross-modal mutual information, semantic correlation, and the status relation. Using these dimensions, we characterized a set of eight image-text classes and showed their relations to existing taxonomies. Moreover, we have shown how the cross-modal mutual information can be further differentiated in order to measure image-text consistency in news at the entity level of persons, locations, and scene context. Experimental results demonstrate the feasibility of the approaches.
AB - In this paper, we present approaches that automatically estimate semantic relations between textual and (pictorial) visual information.We consider the interpretation of these relations as one of the key elements for empirical research on multimodal information. From a computational perspective, it is difficult to automatically “comprehend” the meaning of multimodal information and to interpret cross-modal semantic relations. One reason is that already the automatic understanding and interpretation of a single source of information (e.g., text, image, or audio) is difficult — and it is even more difficult to model and understand the interplay of two different modalities. While the complex interplay of visual and textual information has been investigated in communication sciences and linguistics for years, they have been rarely considered from a computer science perspective. To this end, we review the few currently existing approaches to automatically recognize semantic cross-modal relations. In previous work, we have suggested to model image-text relations along three main dimensions: Cross-modal mutual information, semantic correlation, and the status relation. Using these dimensions, we characterized a set of eight image-text classes and showed their relations to existing taxonomies. Moreover, we have shown how the cross-modal mutual information can be further differentiated in order to measure image-text consistency in news at the entity level of persons, locations, and scene context. Experimental results demonstrate the feasibility of the approaches.
KW - Computer vision
KW - Deep learning
KW - Multimodal information retrieval
KW - Multimodal news analytics
KW - Multimodal semiotic analysis
KW - Semantic image-text classes
UR - http://www.scopus.com/inward/record.url?scp=85135273643&partnerID=8YFLogxK
U2 - 10.1515/9783110725001-005
DO - 10.1515/9783110725001-005
M3 - Contribution to book/anthology
AN - SCOPUS:85135273643
SN - 9783110724912
SP - 109
EP - 138
BT - Empirical Multimodality Research
A2 - Pflaeging, Jana
A2 - Wildfeuer , Janina
A2 - Bateman, John A.
PB - de Gruyter
ER -