Multimodal news analytics using measures of cross-modal entity and context consistency

Eric Müller-Budack; Jonas Theiner; Sebastian Diering; Maximilian Idahl; Sherzod Hakimov; Ralph Ewerth

doi:10.1007/S13735-021-00207-4

Details

Originalsprache	Englisch
Seiten (von - bis)	111-125
Seitenumfang	15
Fachzeitschrift	Int. J. Multim. Inf. Retr.
Jahrgang	10
Ausgabenummer	2
Frühes Online-Datum	28 Apr. 2021
Publikationsstatus	Veröffentlicht - Juni 2021

Abstract

The World Wide Web has become a popular source to gather information and news. Multimodal information, e.g., supplement text with photographs, is typically used to convey the news more effectively or to attract attention. The photographs can be decorative, depict additional details, but might also contain misleading information. The quantification of the cross-modal consistency of entity representations can assist human assessors’ evaluation of the overall multimodal message. In some cases such measures might give hints to detect fake news, which is an increasingly important topic in today’s society. In this paper, we present a multimodal approach to quantify the entity coherence between image and text in real-world news. Named entity linking is applied to extract persons, locations, and events from news texts. Several measures are suggested to calculate the cross-modal similarity of the entities in text and photograph by exploiting state-of-the-art computer vision approaches. In contrast to previous work, our system automatically acquires example data from the Web and is applicable to real-world news. Moreover, an approach that quantifies contextual image-text relations is introduced. The feasibility is demonstrated on two datasets that cover different languages, topics, and domains.

ASJC Scopus Sachgebiete

Informatik (insg.)
Information systems
Sozialwissenschaften (insg.)
Bibliotheks- und Informationswissenschaften
Ingenieurwesen (insg.)
Medientechnik

Zitieren

Multimodal news analytics using measures of cross-modal entity and context consistency. / Müller-Budack, Eric; Theiner, Jonas; Diering, Sebastian et al.
in: Int. J. Multim. Inf. Retr., Jahrgang 10, Nr. 2, 06.2021, S. 111-125.

Publikation: Beitrag in Fachzeitschrift › Artikel › Forschung › Peer-Review

Müller-Budack, E, Theiner, J, Diering, S, Idahl, M, Hakimov, S & Ewerth, R 2021, 'Multimodal news analytics using measures of cross-modal entity and context consistency', Int. J. Multim. Inf. Retr., Jg. 10, Nr. 2, S. 111-125. https://doi.org/10.1007/S13735-021-00207-4

Müller-Budack, E., Theiner, J., Diering, S., Idahl, M., Hakimov, S., & Ewerth, R. (2021). Multimodal news analytics using measures of cross-modal entity and context consistency. Int. J. Multim. Inf. Retr., 10(2), 111-125. https://doi.org/10.1007/S13735-021-00207-4

Müller-Budack E, Theiner J, Diering S, Idahl M, Hakimov S, Ewerth R. Multimodal news analytics using measures of cross-modal entity and context consistency. Int. J. Multim. Inf. Retr. 2021 Jun;10(2):111-125. Epub 2021 Apr 28. doi: 10.1007/S13735-021-00207-4

Müller-Budack, Eric ; Theiner, Jonas ; Diering, Sebastian et al. / Multimodal news analytics using measures of cross-modal entity and context consistency. in: Int. J. Multim. Inf. Retr. 2021 ; Jahrgang 10, Nr. 2. S. 111-125.

Download

@article{f4c152c6ac74411eb387b37442f5e088,

title = "Multimodal news analytics using measures of cross-modal entity and context consistency",

abstract = "The World Wide Web has become a popular source to gather information and news. Multimodal information, e.g., supplement text with photographs, is typically used to convey the news more effectively or to attract attention. The photographs can be decorative, depict additional details, but might also contain misleading information. The quantification of the cross-modal consistency of entity representations can assist human assessors{\textquoteright} evaluation of the overall multimodal message. In some cases such measures might give hints to detect fake news, which is an increasingly important topic in today{\textquoteright}s society. In this paper, we present a multimodal approach to quantify the entity coherence between image and text in real-world news. Named entity linking is applied to extract persons, locations, and events from news texts. Several measures are suggested to calculate the cross-modal similarity of the entities in text and photograph by exploiting state-of-the-art computer vision approaches. In contrast to previous work, our system automatically acquires example data from the Web and is applicable to real-world news. Moreover, an approach that quantifies contextual image-text relations is introduced. The feasibility is demonstrated on two datasets that cover different languages, topics, and domains.",

keywords = "Cross-modal consistency, Image repurposing detection, Image-text relations, News analytics",

author = "Eric M{\"u}ller-Budack and Jonas Theiner and Sebastian Diering and Maximilian Idahl and Sherzod Hakimov and Ralph Ewerth",

note = "Funding Information: This work has partially received funding from the European Union{\textquoteright}s Horizon research and innovation programme 2020 under the Marie Sk{\l}odowska-Curie Grant Agreement No 812997, and the German Research Foundation (DFG: Deutsche Forschungsgemeinschaft, project number: 388420599). We are very grateful to Avishek Anand (L3S Research Center, Leibniz University Hannover) for his valuable comments that improved the quality of the paper.",

year = "2021",

month = jun,

doi = "10.1007/S13735-021-00207-4",

language = "English",

volume = "10",

pages = "111--125",

number = "2",

}

Download

TY - JOUR

T1 - Multimodal news analytics using measures of cross-modal entity and context consistency

AU - Müller-Budack, Eric

AU - Theiner, Jonas

AU - Diering, Sebastian

AU - Idahl, Maximilian

AU - Hakimov, Sherzod

AU - Ewerth, Ralph

N1 - Funding Information: This work has partially received funding from the European Union’s Horizon research and innovation programme 2020 under the Marie Skłodowska-Curie Grant Agreement No 812997, and the German Research Foundation (DFG: Deutsche Forschungsgemeinschaft, project number: 388420599). We are very grateful to Avishek Anand (L3S Research Center, Leibniz University Hannover) for his valuable comments that improved the quality of the paper.

PY - 2021/6

Y1 - 2021/6

N2 - The World Wide Web has become a popular source to gather information and news. Multimodal information, e.g., supplement text with photographs, is typically used to convey the news more effectively or to attract attention. The photographs can be decorative, depict additional details, but might also contain misleading information. The quantification of the cross-modal consistency of entity representations can assist human assessors’ evaluation of the overall multimodal message. In some cases such measures might give hints to detect fake news, which is an increasingly important topic in today’s society. In this paper, we present a multimodal approach to quantify the entity coherence between image and text in real-world news. Named entity linking is applied to extract persons, locations, and events from news texts. Several measures are suggested to calculate the cross-modal similarity of the entities in text and photograph by exploiting state-of-the-art computer vision approaches. In contrast to previous work, our system automatically acquires example data from the Web and is applicable to real-world news. Moreover, an approach that quantifies contextual image-text relations is introduced. The feasibility is demonstrated on two datasets that cover different languages, topics, and domains.

AB - The World Wide Web has become a popular source to gather information and news. Multimodal information, e.g., supplement text with photographs, is typically used to convey the news more effectively or to attract attention. The photographs can be decorative, depict additional details, but might also contain misleading information. The quantification of the cross-modal consistency of entity representations can assist human assessors’ evaluation of the overall multimodal message. In some cases such measures might give hints to detect fake news, which is an increasingly important topic in today’s society. In this paper, we present a multimodal approach to quantify the entity coherence between image and text in real-world news. Named entity linking is applied to extract persons, locations, and events from news texts. Several measures are suggested to calculate the cross-modal similarity of the entities in text and photograph by exploiting state-of-the-art computer vision approaches. In contrast to previous work, our system automatically acquires example data from the Web and is applicable to real-world news. Moreover, an approach that quantifies contextual image-text relations is introduced. The feasibility is demonstrated on two datasets that cover different languages, topics, and domains.

KW - Cross-modal consistency

KW - Image repurposing detection

KW - Image-text relations

KW - News analytics

UR - http://www.scopus.com/inward/record.url?scp=85105420523&partnerID=8YFLogxK

U2 - 10.1007/S13735-021-00207-4

DO - 10.1007/S13735-021-00207-4

M3 - Article

VL - 10

SP - 111

EP - 125

JO - Int. J. Multim. Inf. Retr.

JF - Int. J. Multim. Inf. Retr.

IS - 2

ER -

Research@Leibniz University

Multimodal news analytics using measures of cross-modal entity and context consistency

Autoren

Organisationseinheiten

Externe Organisationen

Details

Abstract

ASJC Scopus Sachgebiete

Zitieren