Computational Approaches for the Interpretation of Image-Text Relations

Ralph Ewerth; Christian Otto; Eric Müller-Budack

doi:10.1515/9783110725001-005

Details

Originalsprache	Englisch
Titel des Sammelwerks	Empirical Multimodality Research
Untertitel	Methods, Evaluations, Implications
Herausgeber/-innen	Jana Pflaeging, Janina Wildfeuer , John A. Bateman
Herausgeber (Verlag)	de Gruyter
Seiten	109-138
Seitenumfang	30
ISBN (elektronisch)	9783110725001
ISBN (Print)	9783110724912
Publikationsstatus	Veröffentlicht - 1 Jan. 2021

Abstract

In this paper, we present approaches that automatically estimate semantic relations between textual and (pictorial) visual information.We consider the interpretation of these relations as one of the key elements for empirical research on multimodal information. From a computational perspective, it is difficult to automatically “comprehend” the meaning of multimodal information and to interpret cross-modal semantic relations. One reason is that already the automatic understanding and interpretation of a single source of information (e.g., text, image, or audio) is difficult — and it is even more difficult to model and understand the interplay of two different modalities. While the complex interplay of visual and textual information has been investigated in communication sciences and linguistics for years, they have been rarely considered from a computer science perspective. To this end, we review the few currently existing approaches to automatically recognize semantic cross-modal relations. In previous work, we have suggested to model image-text relations along three main dimensions: Cross-modal mutual information, semantic correlation, and the status relation. Using these dimensions, we characterized a set of eight image-text classes and showed their relations to existing taxonomies. Moreover, we have shown how the cross-modal mutual information can be further differentiated in order to measure image-text consistency in news at the entity level of persons, locations, and scene context. Experimental results demonstrate the feasibility of the approaches.

ASJC Scopus Sachgebiete

Geisteswissenschaftliche Fächer (insg.)
Allgemeine Kunst und Geisteswissenschaften
Sozialwissenschaften (insg.)
Allgemeine Sozialwissenschaften

Zitieren

Computational Approaches for the Interpretation of Image-Text Relations. / Ewerth, Ralph; Otto, Christian; Müller-Budack, Eric.
Empirical Multimodality Research: Methods, Evaluations, Implications. Hrsg. / Jana Pflaeging; Janina Wildfeuer ; John A. Bateman. de Gruyter, 2021. S. 109-138.

Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Beitrag in Buch/Sammelwerk › Forschung › Peer-Review

Ewerth, R, Otto, C & Müller-Budack, E 2021, Computational Approaches for the Interpretation of Image-Text Relations. in J Pflaeging, J Wildfeuer & JA Bateman (Hrsg.), Empirical Multimodality Research: Methods, Evaluations, Implications. de Gruyter, S. 109-138. https://doi.org/10.1515/9783110725001-005

Ewerth, R., Otto, C., & Müller-Budack, E. (2021). Computational Approaches for the Interpretation of Image-Text Relations. In J. Pflaeging, J. Wildfeuer , & J. A. Bateman (Hrsg.), Empirical Multimodality Research: Methods, Evaluations, Implications (S. 109-138). de Gruyter. https://doi.org/10.1515/9783110725001-005

Ewerth R, Otto C, Müller-Budack E. Computational Approaches for the Interpretation of Image-Text Relations. in Pflaeging J, Wildfeuer J, Bateman JA, Hrsg., Empirical Multimodality Research: Methods, Evaluations, Implications. de Gruyter. 2021. S. 109-138 doi: 10.1515/9783110725001-005

Ewerth, Ralph ; Otto, Christian ; Müller-Budack, Eric. / Computational Approaches for the Interpretation of Image-Text Relations. Empirical Multimodality Research: Methods, Evaluations, Implications. Hrsg. / Jana Pflaeging ; Janina Wildfeuer ; John A. Bateman. de Gruyter, 2021. S. 109-138

Download

@inbook{0b5b0f7e477c4ae3b13d3e6395b33980,

title = "Computational Approaches for the Interpretation of Image-Text Relations",

abstract = "In this paper, we present approaches that automatically estimate semantic relations between textual and (pictorial) visual information.We consider the interpretation of these relations as one of the key elements for empirical research on multimodal information. From a computational perspective, it is difficult to automatically “comprehend” the meaning of multimodal information and to interpret cross-modal semantic relations. One reason is that already the automatic understanding and interpretation of a single source of information (e.g., text, image, or audio) is difficult — and it is even more difficult to model and understand the interplay of two different modalities. While the complex interplay of visual and textual information has been investigated in communication sciences and linguistics for years, they have been rarely considered from a computer science perspective. To this end, we review the few currently existing approaches to automatically recognize semantic cross-modal relations. In previous work, we have suggested to model image-text relations along three main dimensions: Cross-modal mutual information, semantic correlation, and the status relation. Using these dimensions, we characterized a set of eight image-text classes and showed their relations to existing taxonomies. Moreover, we have shown how the cross-modal mutual information can be further differentiated in order to measure image-text consistency in news at the entity level of persons, locations, and scene context. Experimental results demonstrate the feasibility of the approaches.",

keywords = "Computer vision, Deep learning, Multimodal information retrieval, Multimodal news analytics, Multimodal semiotic analysis, Semantic image-text classes",

author = "Ralph Ewerth and Christian Otto and Eric M{\"u}ller-Budack",

year = "2021",

month = jan,

day = "1",

doi = "10.1515/9783110725001-005",

language = "English",

isbn = "9783110724912",

pages = "109--138",

editor = "Pflaeging, { Jana} and {Wildfeuer }, { Janina} and Bateman, { John A. }",

booktitle = "Empirical Multimodality Research",

publisher = "de Gruyter",

address = "Germany",

}

Download

TY - CHAP

T1 - Computational Approaches for the Interpretation of Image-Text Relations

AU - Ewerth, Ralph

AU - Otto, Christian

AU - Müller-Budack, Eric

PY - 2021/1/1

Y1 - 2021/1/1

N2 - In this paper, we present approaches that automatically estimate semantic relations between textual and (pictorial) visual information.We consider the interpretation of these relations as one of the key elements for empirical research on multimodal information. From a computational perspective, it is difficult to automatically “comprehend” the meaning of multimodal information and to interpret cross-modal semantic relations. One reason is that already the automatic understanding and interpretation of a single source of information (e.g., text, image, or audio) is difficult — and it is even more difficult to model and understand the interplay of two different modalities. While the complex interplay of visual and textual information has been investigated in communication sciences and linguistics for years, they have been rarely considered from a computer science perspective. To this end, we review the few currently existing approaches to automatically recognize semantic cross-modal relations. In previous work, we have suggested to model image-text relations along three main dimensions: Cross-modal mutual information, semantic correlation, and the status relation. Using these dimensions, we characterized a set of eight image-text classes and showed their relations to existing taxonomies. Moreover, we have shown how the cross-modal mutual information can be further differentiated in order to measure image-text consistency in news at the entity level of persons, locations, and scene context. Experimental results demonstrate the feasibility of the approaches.

AB - In this paper, we present approaches that automatically estimate semantic relations between textual and (pictorial) visual information.We consider the interpretation of these relations as one of the key elements for empirical research on multimodal information. From a computational perspective, it is difficult to automatically “comprehend” the meaning of multimodal information and to interpret cross-modal semantic relations. One reason is that already the automatic understanding and interpretation of a single source of information (e.g., text, image, or audio) is difficult — and it is even more difficult to model and understand the interplay of two different modalities. While the complex interplay of visual and textual information has been investigated in communication sciences and linguistics for years, they have been rarely considered from a computer science perspective. To this end, we review the few currently existing approaches to automatically recognize semantic cross-modal relations. In previous work, we have suggested to model image-text relations along three main dimensions: Cross-modal mutual information, semantic correlation, and the status relation. Using these dimensions, we characterized a set of eight image-text classes and showed their relations to existing taxonomies. Moreover, we have shown how the cross-modal mutual information can be further differentiated in order to measure image-text consistency in news at the entity level of persons, locations, and scene context. Experimental results demonstrate the feasibility of the approaches.

KW - Computer vision

KW - Deep learning

KW - Multimodal information retrieval

KW - Multimodal news analytics

KW - Multimodal semiotic analysis

KW - Semantic image-text classes

UR - http://www.scopus.com/inward/record.url?scp=85135273643&partnerID=8YFLogxK

U2 - 10.1515/9783110725001-005

DO - 10.1515/9783110725001-005

M3 - Contribution to book/anthology

AN - SCOPUS:85135273643

SN - 9783110724912

SP - 109

EP - 138

BT - Empirical Multimodality Research

A2 - Pflaeging, Jana

A2 - Wildfeuer , Janina

A2 - Bateman, John A.

PB - de Gruyter

ER -

Research@Leibniz University

Computational Approaches for the Interpretation of Image-Text Relations

Autorschaft

Organisationseinheiten

Externe Organisationen

Details

Abstract

ASJC Scopus Sachgebiete

Zitieren