Estimating the Information Gap between Textual and Visual Representations

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Autoren

  • Christian Henning
  • Ralph Ewerth

Organisationseinheiten

Externe Organisationen

  • Technische Informationsbibliothek (TIB) Leibniz-Informationszentrum Technik und Naturwissenschaften und Universitätsbibliothek
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Titel des SammelwerksICMR 2017
UntertitelProceedings of the 2017 ACM International Conference on Multimedia Retrieval
Seiten14-22
Seitenumfang9
ISBN (elektronisch)9781450347013
PublikationsstatusVeröffentlicht - 6 Juni 2017
Veranstaltung17th ACM International Conference on Multimedia Retrieval, ICMR 2017 - Bucharest, Rumänien
Dauer: 6 Juni 20179 Juni 2017

Publikationsreihe

NameICMR 2017 - Proceedings of the 2017 ACM International Conference on Multimedia Retrieval

Abstract

Photos, drawings, figures, etc. supplement textual information in various kinds of media, for example, in web news or scientific publications. In this respect, the intended effect of an image can be quite different, e.g., providing additional information, focusing on certain details of surrounding text, or simply being a general illustration of a topic. As a consequence, the semantic correlation between information of different modalities can vary noticeably, too. Moreover, cross-modal interrelations are often hard to describe in a precise way. The variety of possible interrelations of textual and graphical information and the question, how they can be described and automatically estimated have not been addressed yet by previous work. In this paper, we present several contributions to close this gap. First, we introduce two measures to describe crossmodal interrelations: cross-modal mutual information (CMI) and semantic correlation (SC). Second, a novel approach relying on deep learning is suggested to estimate CMI and SC of textual and visual information. Third, three diverse datasets are leveraged to learn an appropriate deep neural network model for the demanding task. The system has been evaluated on a challenging test set and the experimental results demonstrate the feasibility of the approach.

ASJC Scopus Sachgebiete

Zitieren

Estimating the Information Gap between Textual and Visual Representations. / Henning, Christian; Ewerth, Ralph.
ICMR 2017 : Proceedings of the 2017 ACM International Conference on Multimedia Retrieval. 2017. S. 14-22 (ICMR 2017 - Proceedings of the 2017 ACM International Conference on Multimedia Retrieval).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Henning, C & Ewerth, R 2017, Estimating the Information Gap between Textual and Visual Representations. in ICMR 2017 : Proceedings of the 2017 ACM International Conference on Multimedia Retrieval. ICMR 2017 - Proceedings of the 2017 ACM International Conference on Multimedia Retrieval, S. 14-22, 17th ACM International Conference on Multimedia Retrieval, ICMR 2017, Bucharest, Rumänien, 6 Juni 2017. https://doi.org/10.1145/3078971.3078991
Henning, C., & Ewerth, R. (2017). Estimating the Information Gap between Textual and Visual Representations. In ICMR 2017 : Proceedings of the 2017 ACM International Conference on Multimedia Retrieval (S. 14-22). (ICMR 2017 - Proceedings of the 2017 ACM International Conference on Multimedia Retrieval). https://doi.org/10.1145/3078971.3078991
Henning C, Ewerth R. Estimating the Information Gap between Textual and Visual Representations. in ICMR 2017 : Proceedings of the 2017 ACM International Conference on Multimedia Retrieval. 2017. S. 14-22. (ICMR 2017 - Proceedings of the 2017 ACM International Conference on Multimedia Retrieval). doi: 10.1145/3078971.3078991
Henning, Christian ; Ewerth, Ralph. / Estimating the Information Gap between Textual and Visual Representations. ICMR 2017 : Proceedings of the 2017 ACM International Conference on Multimedia Retrieval. 2017. S. 14-22 (ICMR 2017 - Proceedings of the 2017 ACM International Conference on Multimedia Retrieval).
Download
@inproceedings{a0291c13e6d74baa96412917ec5c8ba6,
title = "Estimating the Information Gap between Textual and Visual Representations",
abstract = "Photos, drawings, figures, etc. supplement textual information in various kinds of media, for example, in web news or scientific publications. In this respect, the intended effect of an image can be quite different, e.g., providing additional information, focusing on certain details of surrounding text, or simply being a general illustration of a topic. As a consequence, the semantic correlation between information of different modalities can vary noticeably, too. Moreover, cross-modal interrelations are often hard to describe in a precise way. The variety of possible interrelations of textual and graphical information and the question, how they can be described and automatically estimated have not been addressed yet by previous work. In this paper, we present several contributions to close this gap. First, we introduce two measures to describe crossmodal interrelations: cross-modal mutual information (CMI) and semantic correlation (SC). Second, a novel approach relying on deep learning is suggested to estimate CMI and SC of textual and visual information. Third, three diverse datasets are leveraged to learn an appropriate deep neural network model for the demanding task. The system has been evaluated on a challenging test set and the experimental results demonstrate the feasibility of the approach.",
author = "Christian Henning and Ralph Ewerth",
year = "2017",
month = jun,
day = "6",
doi = "10.1145/3078971.3078991",
language = "English",
series = "ICMR 2017 - Proceedings of the 2017 ACM International Conference on Multimedia Retrieval",
pages = "14--22",
booktitle = "ICMR 2017",
note = "17th ACM International Conference on Multimedia Retrieval, ICMR 2017 ; Conference date: 06-06-2017 Through 09-06-2017",

}

Download

TY - GEN

T1 - Estimating the Information Gap between Textual and Visual Representations

AU - Henning, Christian

AU - Ewerth, Ralph

PY - 2017/6/6

Y1 - 2017/6/6

N2 - Photos, drawings, figures, etc. supplement textual information in various kinds of media, for example, in web news or scientific publications. In this respect, the intended effect of an image can be quite different, e.g., providing additional information, focusing on certain details of surrounding text, or simply being a general illustration of a topic. As a consequence, the semantic correlation between information of different modalities can vary noticeably, too. Moreover, cross-modal interrelations are often hard to describe in a precise way. The variety of possible interrelations of textual and graphical information and the question, how they can be described and automatically estimated have not been addressed yet by previous work. In this paper, we present several contributions to close this gap. First, we introduce two measures to describe crossmodal interrelations: cross-modal mutual information (CMI) and semantic correlation (SC). Second, a novel approach relying on deep learning is suggested to estimate CMI and SC of textual and visual information. Third, three diverse datasets are leveraged to learn an appropriate deep neural network model for the demanding task. The system has been evaluated on a challenging test set and the experimental results demonstrate the feasibility of the approach.

AB - Photos, drawings, figures, etc. supplement textual information in various kinds of media, for example, in web news or scientific publications. In this respect, the intended effect of an image can be quite different, e.g., providing additional information, focusing on certain details of surrounding text, or simply being a general illustration of a topic. As a consequence, the semantic correlation between information of different modalities can vary noticeably, too. Moreover, cross-modal interrelations are often hard to describe in a precise way. The variety of possible interrelations of textual and graphical information and the question, how they can be described and automatically estimated have not been addressed yet by previous work. In this paper, we present several contributions to close this gap. First, we introduce two measures to describe crossmodal interrelations: cross-modal mutual information (CMI) and semantic correlation (SC). Second, a novel approach relying on deep learning is suggested to estimate CMI and SC of textual and visual information. Third, three diverse datasets are leveraged to learn an appropriate deep neural network model for the demanding task. The system has been evaluated on a challenging test set and the experimental results demonstrate the feasibility of the approach.

UR - http://www.scopus.com/inward/record.url?scp=85021839630&partnerID=8YFLogxK

U2 - 10.1145/3078971.3078991

DO - 10.1145/3078971.3078991

M3 - Conference contribution

AN - SCOPUS:85021839630

T3 - ICMR 2017 - Proceedings of the 2017 ACM International Conference on Multimedia Retrieval

SP - 14

EP - 22

BT - ICMR 2017

T2 - 17th ACM International Conference on Multimedia Retrieval, ICMR 2017

Y2 - 6 June 2017 through 9 June 2017

ER -