Estimating the Information Gap between Textual and Visual Representations

Christian Henning; Ralph Ewerth

doi:10.1145/3078971.3078991

Details

Originalsprache	Englisch
Titel des Sammelwerks	ICMR 2017
Untertitel	Proceedings of the 2017 ACM International Conference on Multimedia Retrieval
Seiten	14-22
Seitenumfang	9
ISBN (elektronisch)	9781450347013
Publikationsstatus	Veröffentlicht - 6 Juni 2017
Veranstaltung	17th ACM International Conference on Multimedia Retrieval, ICMR 2017 - Bucharest, Rumänien Dauer: 6 Juni 2017 → 9 Juni 2017

Publikationsreihe

Name	ICMR 2017 - Proceedings of the 2017 ACM International Conference on Multimedia Retrieval

Abstract

Photos, drawings, figures, etc. supplement textual information in various kinds of media, for example, in web news or scientific publications. In this respect, the intended effect of an image can be quite different, e.g., providing additional information, focusing on certain details of surrounding text, or simply being a general illustration of a topic. As a consequence, the semantic correlation between information of different modalities can vary noticeably, too. Moreover, cross-modal interrelations are often hard to describe in a precise way. The variety of possible interrelations of textual and graphical information and the question, how they can be described and automatically estimated have not been addressed yet by previous work. In this paper, we present several contributions to close this gap. First, we introduce two measures to describe crossmodal interrelations: cross-modal mutual information (CMI) and semantic correlation (SC). Second, a novel approach relying on deep learning is suggested to estimate CMI and SC of textual and visual information. Third, three diverse datasets are leveraged to learn an appropriate deep neural network model for the demanding task. The system has been evaluated on a challenging test set and the experimental results demonstrate the feasibility of the approach.

ASJC Scopus Sachgebiete

Informatik (insg.)
Mensch-Maschine-Interaktion
Informatik (insg.)
Software
Informatik (insg.)
Computergrafik und computergestütztes Design

Zitieren

Estimating the Information Gap between Textual and Visual Representations. / Henning, Christian; Ewerth, Ralph.
ICMR 2017 : Proceedings of the 2017 ACM International Conference on Multimedia Retrieval. 2017. S. 14-22 (ICMR 2017 - Proceedings of the 2017 ACM International Conference on Multimedia Retrieval).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review

Henning, C & Ewerth, R 2017, Estimating the Information Gap between Textual and Visual Representations. in ICMR 2017 : Proceedings of the 2017 ACM International Conference on Multimedia Retrieval. ICMR 2017 - Proceedings of the 2017 ACM International Conference on Multimedia Retrieval, S. 14-22, 17th ACM International Conference on Multimedia Retrieval, ICMR 2017, Bucharest, Rumänien, 6 Juni 2017. https://doi.org/10.1145/3078971.3078991

Henning, C., & Ewerth, R. (2017). Estimating the Information Gap between Textual and Visual Representations. In ICMR 2017 : Proceedings of the 2017 ACM International Conference on Multimedia Retrieval (S. 14-22). (ICMR 2017 - Proceedings of the 2017 ACM International Conference on Multimedia Retrieval). https://doi.org/10.1145/3078971.3078991

Henning C, Ewerth R. Estimating the Information Gap between Textual and Visual Representations. in ICMR 2017 : Proceedings of the 2017 ACM International Conference on Multimedia Retrieval. 2017. S. 14-22. (ICMR 2017 - Proceedings of the 2017 ACM International Conference on Multimedia Retrieval). doi: 10.1145/3078971.3078991

Henning, Christian ; Ewerth, Ralph. / Estimating the Information Gap between Textual and Visual Representations. ICMR 2017 : Proceedings of the 2017 ACM International Conference on Multimedia Retrieval. 2017. S. 14-22 (ICMR 2017 - Proceedings of the 2017 ACM International Conference on Multimedia Retrieval).

Download

@inproceedings{a0291c13e6d74baa96412917ec5c8ba6,

title = "Estimating the Information Gap between Textual and Visual Representations",

abstract = "Photos, drawings, figures, etc. supplement textual information in various kinds of media, for example, in web news or scientific publications. In this respect, the intended effect of an image can be quite different, e.g., providing additional information, focusing on certain details of surrounding text, or simply being a general illustration of a topic. As a consequence, the semantic correlation between information of different modalities can vary noticeably, too. Moreover, cross-modal interrelations are often hard to describe in a precise way. The variety of possible interrelations of textual and graphical information and the question, how they can be described and automatically estimated have not been addressed yet by previous work. In this paper, we present several contributions to close this gap. First, we introduce two measures to describe crossmodal interrelations: cross-modal mutual information (CMI) and semantic correlation (SC). Second, a novel approach relying on deep learning is suggested to estimate CMI and SC of textual and visual information. Third, three diverse datasets are leveraged to learn an appropriate deep neural network model for the demanding task. The system has been evaluated on a challenging test set and the experimental results demonstrate the feasibility of the approach.",

author = "Christian Henning and Ralph Ewerth",

year = "2017",

month = jun,

day = "6",

doi = "10.1145/3078971.3078991",

language = "English",

series = "ICMR 2017 - Proceedings of the 2017 ACM International Conference on Multimedia Retrieval",

pages = "14--22",

booktitle = "ICMR 2017",

note = "17th ACM International Conference on Multimedia Retrieval, ICMR 2017 ; Conference date: 06-06-2017 Through 09-06-2017",

}

Download

TY - GEN

T1 - Estimating the Information Gap between Textual and Visual Representations

AU - Henning, Christian

AU - Ewerth, Ralph

PY - 2017/6/6

Y1 - 2017/6/6

N2 - Photos, drawings, figures, etc. supplement textual information in various kinds of media, for example, in web news or scientific publications. In this respect, the intended effect of an image can be quite different, e.g., providing additional information, focusing on certain details of surrounding text, or simply being a general illustration of a topic. As a consequence, the semantic correlation between information of different modalities can vary noticeably, too. Moreover, cross-modal interrelations are often hard to describe in a precise way. The variety of possible interrelations of textual and graphical information and the question, how they can be described and automatically estimated have not been addressed yet by previous work. In this paper, we present several contributions to close this gap. First, we introduce two measures to describe crossmodal interrelations: cross-modal mutual information (CMI) and semantic correlation (SC). Second, a novel approach relying on deep learning is suggested to estimate CMI and SC of textual and visual information. Third, three diverse datasets are leveraged to learn an appropriate deep neural network model for the demanding task. The system has been evaluated on a challenging test set and the experimental results demonstrate the feasibility of the approach.

AB - Photos, drawings, figures, etc. supplement textual information in various kinds of media, for example, in web news or scientific publications. In this respect, the intended effect of an image can be quite different, e.g., providing additional information, focusing on certain details of surrounding text, or simply being a general illustration of a topic. As a consequence, the semantic correlation between information of different modalities can vary noticeably, too. Moreover, cross-modal interrelations are often hard to describe in a precise way. The variety of possible interrelations of textual and graphical information and the question, how they can be described and automatically estimated have not been addressed yet by previous work. In this paper, we present several contributions to close this gap. First, we introduce two measures to describe crossmodal interrelations: cross-modal mutual information (CMI) and semantic correlation (SC). Second, a novel approach relying on deep learning is suggested to estimate CMI and SC of textual and visual information. Third, three diverse datasets are leveraged to learn an appropriate deep neural network model for the demanding task. The system has been evaluated on a challenging test set and the experimental results demonstrate the feasibility of the approach.

UR - http://www.scopus.com/inward/record.url?scp=85021839630&partnerID=8YFLogxK

U2 - 10.1145/3078971.3078991

DO - 10.1145/3078971.3078991

M3 - Conference contribution

AN - SCOPUS:85021839630

T3 - ICMR 2017 - Proceedings of the 2017 ACM International Conference on Multimedia Retrieval

SP - 14

EP - 22

BT - ICMR 2017

T2 - 17th ACM International Conference on Multimedia Retrieval, ICMR 2017

Y2 - 6 June 2017 through 9 June 2017

ER -

Research@Leibniz University

Estimating the Information Gap between Textual and Visual Representations

Autoren

Organisationseinheiten

Externe Organisationen

Details

Publikationsreihe

Abstract

ASJC Scopus Sachgebiete

Zitieren