Details
Originalsprache | Englisch |
---|---|
Titel des Sammelwerks | Empirical Multimodality Research |
Untertitel | Methods, Evaluations, Implications |
Herausgeber/-innen | Jana Pflaeging, Janina Wildfeuer , John A. Bateman |
Herausgeber (Verlag) | de Gruyter |
Seiten | 109-138 |
Seitenumfang | 30 |
ISBN (elektronisch) | 9783110725001 |
ISBN (Print) | 9783110724912 |
Publikationsstatus | Veröffentlicht - 1 Jan. 2021 |
Abstract
In this paper, we present approaches that automatically estimate semantic relations between textual and (pictorial) visual information.We consider the interpretation of these relations as one of the key elements for empirical research on multimodal information. From a computational perspective, it is difficult to automatically “comprehend” the meaning of multimodal information and to interpret cross-modal semantic relations. One reason is that already the automatic understanding and interpretation of a single source of information (e.g., text, image, or audio) is difficult — and it is even more difficult to model and understand the interplay of two different modalities. While the complex interplay of visual and textual information has been investigated in communication sciences and linguistics for years, they have been rarely considered from a computer science perspective. To this end, we review the few currently existing approaches to automatically recognize semantic cross-modal relations. In previous work, we have suggested to model image-text relations along three main dimensions: Cross-modal mutual information, semantic correlation, and the status relation. Using these dimensions, we characterized a set of eight image-text classes and showed their relations to existing taxonomies. Moreover, we have shown how the cross-modal mutual information can be further differentiated in order to measure image-text consistency in news at the entity level of persons, locations, and scene context. Experimental results demonstrate the feasibility of the approaches.
ASJC Scopus Sachgebiete
- Geisteswissenschaftliche Fächer (insg.)
- Allgemeine Kunst und Geisteswissenschaften
- Sozialwissenschaften (insg.)
- Allgemeine Sozialwissenschaften
Zitieren
- Standard
- Harvard
- Apa
- Vancouver
- BibTex
- RIS
Empirical Multimodality Research: Methods, Evaluations, Implications. Hrsg. / Jana Pflaeging; Janina Wildfeuer ; John A. Bateman. de Gruyter, 2021. S. 109-138.
Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Beitrag in Buch/Sammelwerk › Forschung › Peer-Review
}
TY - CHAP
T1 - Computational Approaches for the Interpretation of Image-Text Relations
AU - Ewerth, Ralph
AU - Otto, Christian
AU - Müller-Budack, Eric
PY - 2021/1/1
Y1 - 2021/1/1
N2 - In this paper, we present approaches that automatically estimate semantic relations between textual and (pictorial) visual information.We consider the interpretation of these relations as one of the key elements for empirical research on multimodal information. From a computational perspective, it is difficult to automatically “comprehend” the meaning of multimodal information and to interpret cross-modal semantic relations. One reason is that already the automatic understanding and interpretation of a single source of information (e.g., text, image, or audio) is difficult — and it is even more difficult to model and understand the interplay of two different modalities. While the complex interplay of visual and textual information has been investigated in communication sciences and linguistics for years, they have been rarely considered from a computer science perspective. To this end, we review the few currently existing approaches to automatically recognize semantic cross-modal relations. In previous work, we have suggested to model image-text relations along three main dimensions: Cross-modal mutual information, semantic correlation, and the status relation. Using these dimensions, we characterized a set of eight image-text classes and showed their relations to existing taxonomies. Moreover, we have shown how the cross-modal mutual information can be further differentiated in order to measure image-text consistency in news at the entity level of persons, locations, and scene context. Experimental results demonstrate the feasibility of the approaches.
AB - In this paper, we present approaches that automatically estimate semantic relations between textual and (pictorial) visual information.We consider the interpretation of these relations as one of the key elements for empirical research on multimodal information. From a computational perspective, it is difficult to automatically “comprehend” the meaning of multimodal information and to interpret cross-modal semantic relations. One reason is that already the automatic understanding and interpretation of a single source of information (e.g., text, image, or audio) is difficult — and it is even more difficult to model and understand the interplay of two different modalities. While the complex interplay of visual and textual information has been investigated in communication sciences and linguistics for years, they have been rarely considered from a computer science perspective. To this end, we review the few currently existing approaches to automatically recognize semantic cross-modal relations. In previous work, we have suggested to model image-text relations along three main dimensions: Cross-modal mutual information, semantic correlation, and the status relation. Using these dimensions, we characterized a set of eight image-text classes and showed their relations to existing taxonomies. Moreover, we have shown how the cross-modal mutual information can be further differentiated in order to measure image-text consistency in news at the entity level of persons, locations, and scene context. Experimental results demonstrate the feasibility of the approaches.
KW - Computer vision
KW - Deep learning
KW - Multimodal information retrieval
KW - Multimodal news analytics
KW - Multimodal semiotic analysis
KW - Semantic image-text classes
UR - http://www.scopus.com/inward/record.url?scp=85135273643&partnerID=8YFLogxK
U2 - 10.1515/9783110725001-005
DO - 10.1515/9783110725001-005
M3 - Contribution to book/anthology
AN - SCOPUS:85135273643
SN - 9783110724912
SP - 109
EP - 138
BT - Empirical Multimodality Research
A2 - Pflaeging, Jana
A2 - Wildfeuer , Janina
A2 - Bateman, John A.
PB - de Gruyter
ER -