Understanding image-text relations and news values for multimodal news analysis

Gullal S. Cheema; Sherzod Hakimov; Eric Müller-Budack; Christian Otto; John A. Bateman; Ralph Ewerth

doi:10.3389/frai.2023.1125533

Details

Originalsprache	Englisch
Aufsatznummer	1125533
Fachzeitschrift	Frontiers in Artificial Intelligence
Jahrgang	6
Publikationsstatus	Veröffentlicht - 2 Mai 2023

Abstract

The analysis of news dissemination is of utmost importance since the credibility of information and the identification of disinformation and misinformation affect society as a whole. Given the large amounts of news data published daily on the Web, the empirical analysis of news with regard to research questions and the detection of problematic news content on the Web require computational methods that work at scale. Today's online news are typically disseminated in a multimodal form, including various presentation modalities such as text, image, audio, and video. Recent developments in multimodal machine learning now make it possible to capture basic “descriptive” relations between modalities–such as correspondences between words and phrases, on the one hand, and corresponding visual depictions of the verbally expressed information on the other. Although such advances have enabled tremendous progress in tasks like image captioning, text-to-image generation and visual question answering, in domains such as news dissemination, there is a need to go further. In this paper, we introduce a novel framework for the computational analysis of multimodal news. We motivate a set of more complex image-text relations as well as multimodal news values based on real examples of news reports and consider their realization by computational approaches. To this end, we provide (a) an overview of existing literature from semiotics where detailed proposals have been made for taxonomies covering diverse image-text relations generalisable to any domain; (b) an overview of computational work that derives models of image-text relations from data; and (c) an overview of a particular class of news-centric attributes developed in journalism studies called news values. The result is a novel framework for multimodal news analysis that closes existing gaps in previous work while maintaining and combining the strengths of those accounts. We assess and discuss the elements of the framework with real-world examples and use cases, setting out research directions at the intersection of multimodal learning, multimodal analytics and computational social sciences that can benefit from our approach.

ASJC Scopus Sachgebiete

Informatik (insg.)
Artificial intelligence

Zitieren

Understanding image-text relations and news values for multimodal news analysis. / Cheema, Gullal S.; Hakimov, Sherzod; Müller-Budack, Eric et al.
in: Frontiers in Artificial Intelligence, Jahrgang 6, 1125533, 02.05.2023.

Publikation: Beitrag in Fachzeitschrift › Artikel › Forschung › Peer-Review

Cheema, GS, Hakimov, S, Müller-Budack, E, Otto, C, Bateman, JA & Ewerth, R 2023, 'Understanding image-text relations and news values for multimodal news analysis', Frontiers in Artificial Intelligence, Jg. 6, 1125533. https://doi.org/10.3389/frai.2023.1125533

Cheema, G. S., Hakimov, S., Müller-Budack, E., Otto, C., Bateman, J. A., & Ewerth, R. (2023). Understanding image-text relations and news values for multimodal news analysis. Frontiers in Artificial Intelligence, 6, Artikel 1125533. https://doi.org/10.3389/frai.2023.1125533

Cheema GS, Hakimov S, Müller-Budack E, Otto C, Bateman JA, Ewerth R. Understanding image-text relations and news values for multimodal news analysis. Frontiers in Artificial Intelligence. 2023 Mai 2;6:1125533. doi: 10.3389/frai.2023.1125533

Cheema, Gullal S. ; Hakimov, Sherzod ; Müller-Budack, Eric et al. / Understanding image-text relations and news values for multimodal news analysis. in: Frontiers in Artificial Intelligence. 2023 ; Jahrgang 6.

Download

@article{be05bf9804b948b6962a0733ff0a4ee9,

title = "Understanding image-text relations and news values for multimodal news analysis",

abstract = "The analysis of news dissemination is of utmost importance since the credibility of information and the identification of disinformation and misinformation affect society as a whole. Given the large amounts of news data published daily on the Web, the empirical analysis of news with regard to research questions and the detection of problematic news content on the Web require computational methods that work at scale. Today's online news are typically disseminated in a multimodal form, including various presentation modalities such as text, image, audio, and video. Recent developments in multimodal machine learning now make it possible to capture basic “descriptive” relations between modalities–such as correspondences between words and phrases, on the one hand, and corresponding visual depictions of the verbally expressed information on the other. Although such advances have enabled tremendous progress in tasks like image captioning, text-to-image generation and visual question answering, in domains such as news dissemination, there is a need to go further. In this paper, we introduce a novel framework for the computational analysis of multimodal news. We motivate a set of more complex image-text relations as well as multimodal news values based on real examples of news reports and consider their realization by computational approaches. To this end, we provide (a) an overview of existing literature from semiotics where detailed proposals have been made for taxonomies covering diverse image-text relations generalisable to any domain; (b) an overview of computational work that derives models of image-text relations from data; and (c) an overview of a particular class of news-centric attributes developed in journalism studies called news values. The result is a novel framework for multimodal news analysis that closes existing gaps in previous work while maintaining and combining the strengths of those accounts. We assess and discuss the elements of the framework with real-world examples and use cases, setting out research directions at the intersection of multimodal learning, multimodal analytics and computational social sciences that can benefit from our approach.",

keywords = "computational analytics, image-text relations, journalism, machine learning, multimodality, news analysis, news values, semiotics",

author = "Cheema, {Gullal S.} and Sherzod Hakimov and Eric M{\"u}ller-Budack and Christian Otto and Bateman, {John A.} and Ralph Ewerth",

note = "Funding Information: This work was funded by European Union's Horizon 2020 research and innovation programme under the Marie Sk{\l}odowska-Curie grant agreement no 812997 (CLEOPATRA project) and by the German Federal Ministry of Education and Research (BMBF, FakeNarratives project, no. 16KIS1517). The publication of this article was funded by the Open Access Fund of Technische Informationsbibliothek (TIB). ",

year = "2023",

month = may,

day = "2",

doi = "10.3389/frai.2023.1125533",

language = "English",

volume = "6",

}

Download

TY - JOUR

T1 - Understanding image-text relations and news values for multimodal news analysis

AU - Cheema, Gullal S.

AU - Hakimov, Sherzod

AU - Müller-Budack, Eric

AU - Otto, Christian

AU - Bateman, John A.

AU - Ewerth, Ralph

N1 - Funding Information: This work was funded by European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement no 812997 (CLEOPATRA project) and by the German Federal Ministry of Education and Research (BMBF, FakeNarratives project, no. 16KIS1517). The publication of this article was funded by the Open Access Fund of Technische Informationsbibliothek (TIB).

PY - 2023/5/2

Y1 - 2023/5/2

N2 - The analysis of news dissemination is of utmost importance since the credibility of information and the identification of disinformation and misinformation affect society as a whole. Given the large amounts of news data published daily on the Web, the empirical analysis of news with regard to research questions and the detection of problematic news content on the Web require computational methods that work at scale. Today's online news are typically disseminated in a multimodal form, including various presentation modalities such as text, image, audio, and video. Recent developments in multimodal machine learning now make it possible to capture basic “descriptive” relations between modalities–such as correspondences between words and phrases, on the one hand, and corresponding visual depictions of the verbally expressed information on the other. Although such advances have enabled tremendous progress in tasks like image captioning, text-to-image generation and visual question answering, in domains such as news dissemination, there is a need to go further. In this paper, we introduce a novel framework for the computational analysis of multimodal news. We motivate a set of more complex image-text relations as well as multimodal news values based on real examples of news reports and consider their realization by computational approaches. To this end, we provide (a) an overview of existing literature from semiotics where detailed proposals have been made for taxonomies covering diverse image-text relations generalisable to any domain; (b) an overview of computational work that derives models of image-text relations from data; and (c) an overview of a particular class of news-centric attributes developed in journalism studies called news values. The result is a novel framework for multimodal news analysis that closes existing gaps in previous work while maintaining and combining the strengths of those accounts. We assess and discuss the elements of the framework with real-world examples and use cases, setting out research directions at the intersection of multimodal learning, multimodal analytics and computational social sciences that can benefit from our approach.

AB - The analysis of news dissemination is of utmost importance since the credibility of information and the identification of disinformation and misinformation affect society as a whole. Given the large amounts of news data published daily on the Web, the empirical analysis of news with regard to research questions and the detection of problematic news content on the Web require computational methods that work at scale. Today's online news are typically disseminated in a multimodal form, including various presentation modalities such as text, image, audio, and video. Recent developments in multimodal machine learning now make it possible to capture basic “descriptive” relations between modalities–such as correspondences between words and phrases, on the one hand, and corresponding visual depictions of the verbally expressed information on the other. Although such advances have enabled tremendous progress in tasks like image captioning, text-to-image generation and visual question answering, in domains such as news dissemination, there is a need to go further. In this paper, we introduce a novel framework for the computational analysis of multimodal news. We motivate a set of more complex image-text relations as well as multimodal news values based on real examples of news reports and consider their realization by computational approaches. To this end, we provide (a) an overview of existing literature from semiotics where detailed proposals have been made for taxonomies covering diverse image-text relations generalisable to any domain; (b) an overview of computational work that derives models of image-text relations from data; and (c) an overview of a particular class of news-centric attributes developed in journalism studies called news values. The result is a novel framework for multimodal news analysis that closes existing gaps in previous work while maintaining and combining the strengths of those accounts. We assess and discuss the elements of the framework with real-world examples and use cases, setting out research directions at the intersection of multimodal learning, multimodal analytics and computational social sciences that can benefit from our approach.

KW - computational analytics

KW - image-text relations

KW - journalism

KW - machine learning

KW - multimodality

KW - news analysis

KW - news values

KW - semiotics

UR - http://www.scopus.com/inward/record.url?scp=85159892399&partnerID=8YFLogxK

U2 - 10.3389/frai.2023.1125533

DO - 10.3389/frai.2023.1125533

M3 - Article

AN - SCOPUS:85159892399

VL - 6

JO - Frontiers in Artificial Intelligence

JF - Frontiers in Artificial Intelligence

M1 - 1125533

ER -

Research@Leibniz University

Understanding image-text relations and news values for multimodal news analysis

Autoren

Organisationseinheiten

Externe Organisationen

Details

Abstract

ASJC Scopus Sachgebiete

Zitieren