Estimating the Information Gap between Textual and Visual Representations

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

  • Christian Henning
  • Ralph Ewerth

Research Organisations

External Research Organisations

  • German National Library of Science and Technology (TIB)
View graph of relations

Details

Original languageEnglish
Title of host publicationICMR 2017
Subtitle of host publicationProceedings of the 2017 ACM International Conference on Multimedia Retrieval
Pages14-22
Number of pages9
ISBN (electronic)9781450347013
Publication statusPublished - 6 Jun 2017
Event17th ACM International Conference on Multimedia Retrieval, ICMR 2017 - Bucharest, Romania
Duration: 6 Jun 20179 Jun 2017

Publication series

NameICMR 2017 - Proceedings of the 2017 ACM International Conference on Multimedia Retrieval

Abstract

Photos, drawings, figures, etc. supplement textual information in various kinds of media, for example, in web news or scientific publications. In this respect, the intended effect of an image can be quite different, e.g., providing additional information, focusing on certain details of surrounding text, or simply being a general illustration of a topic. As a consequence, the semantic correlation between information of different modalities can vary noticeably, too. Moreover, cross-modal interrelations are often hard to describe in a precise way. The variety of possible interrelations of textual and graphical information and the question, how they can be described and automatically estimated have not been addressed yet by previous work. In this paper, we present several contributions to close this gap. First, we introduce two measures to describe crossmodal interrelations: cross-modal mutual information (CMI) and semantic correlation (SC). Second, a novel approach relying on deep learning is suggested to estimate CMI and SC of textual and visual information. Third, three diverse datasets are leveraged to learn an appropriate deep neural network model for the demanding task. The system has been evaluated on a challenging test set and the experimental results demonstrate the feasibility of the approach.

ASJC Scopus subject areas

Cite this

Estimating the Information Gap between Textual and Visual Representations. / Henning, Christian; Ewerth, Ralph.
ICMR 2017 : Proceedings of the 2017 ACM International Conference on Multimedia Retrieval. 2017. p. 14-22 (ICMR 2017 - Proceedings of the 2017 ACM International Conference on Multimedia Retrieval).

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Henning, C & Ewerth, R 2017, Estimating the Information Gap between Textual and Visual Representations. in ICMR 2017 : Proceedings of the 2017 ACM International Conference on Multimedia Retrieval. ICMR 2017 - Proceedings of the 2017 ACM International Conference on Multimedia Retrieval, pp. 14-22, 17th ACM International Conference on Multimedia Retrieval, ICMR 2017, Bucharest, Romania, 6 Jun 2017. https://doi.org/10.1145/3078971.3078991
Henning, C., & Ewerth, R. (2017). Estimating the Information Gap between Textual and Visual Representations. In ICMR 2017 : Proceedings of the 2017 ACM International Conference on Multimedia Retrieval (pp. 14-22). (ICMR 2017 - Proceedings of the 2017 ACM International Conference on Multimedia Retrieval). https://doi.org/10.1145/3078971.3078991
Henning C, Ewerth R. Estimating the Information Gap between Textual and Visual Representations. In ICMR 2017 : Proceedings of the 2017 ACM International Conference on Multimedia Retrieval. 2017. p. 14-22. (ICMR 2017 - Proceedings of the 2017 ACM International Conference on Multimedia Retrieval). doi: 10.1145/3078971.3078991
Henning, Christian ; Ewerth, Ralph. / Estimating the Information Gap between Textual and Visual Representations. ICMR 2017 : Proceedings of the 2017 ACM International Conference on Multimedia Retrieval. 2017. pp. 14-22 (ICMR 2017 - Proceedings of the 2017 ACM International Conference on Multimedia Retrieval).
Download
@inproceedings{a0291c13e6d74baa96412917ec5c8ba6,
title = "Estimating the Information Gap between Textual and Visual Representations",
abstract = "Photos, drawings, figures, etc. supplement textual information in various kinds of media, for example, in web news or scientific publications. In this respect, the intended effect of an image can be quite different, e.g., providing additional information, focusing on certain details of surrounding text, or simply being a general illustration of a topic. As a consequence, the semantic correlation between information of different modalities can vary noticeably, too. Moreover, cross-modal interrelations are often hard to describe in a precise way. The variety of possible interrelations of textual and graphical information and the question, how they can be described and automatically estimated have not been addressed yet by previous work. In this paper, we present several contributions to close this gap. First, we introduce two measures to describe crossmodal interrelations: cross-modal mutual information (CMI) and semantic correlation (SC). Second, a novel approach relying on deep learning is suggested to estimate CMI and SC of textual and visual information. Third, three diverse datasets are leveraged to learn an appropriate deep neural network model for the demanding task. The system has been evaluated on a challenging test set and the experimental results demonstrate the feasibility of the approach.",
author = "Christian Henning and Ralph Ewerth",
year = "2017",
month = jun,
day = "6",
doi = "10.1145/3078971.3078991",
language = "English",
series = "ICMR 2017 - Proceedings of the 2017 ACM International Conference on Multimedia Retrieval",
pages = "14--22",
booktitle = "ICMR 2017",
note = "17th ACM International Conference on Multimedia Retrieval, ICMR 2017 ; Conference date: 06-06-2017 Through 09-06-2017",

}

Download

TY - GEN

T1 - Estimating the Information Gap between Textual and Visual Representations

AU - Henning, Christian

AU - Ewerth, Ralph

PY - 2017/6/6

Y1 - 2017/6/6

N2 - Photos, drawings, figures, etc. supplement textual information in various kinds of media, for example, in web news or scientific publications. In this respect, the intended effect of an image can be quite different, e.g., providing additional information, focusing on certain details of surrounding text, or simply being a general illustration of a topic. As a consequence, the semantic correlation between information of different modalities can vary noticeably, too. Moreover, cross-modal interrelations are often hard to describe in a precise way. The variety of possible interrelations of textual and graphical information and the question, how they can be described and automatically estimated have not been addressed yet by previous work. In this paper, we present several contributions to close this gap. First, we introduce two measures to describe crossmodal interrelations: cross-modal mutual information (CMI) and semantic correlation (SC). Second, a novel approach relying on deep learning is suggested to estimate CMI and SC of textual and visual information. Third, three diverse datasets are leveraged to learn an appropriate deep neural network model for the demanding task. The system has been evaluated on a challenging test set and the experimental results demonstrate the feasibility of the approach.

AB - Photos, drawings, figures, etc. supplement textual information in various kinds of media, for example, in web news or scientific publications. In this respect, the intended effect of an image can be quite different, e.g., providing additional information, focusing on certain details of surrounding text, or simply being a general illustration of a topic. As a consequence, the semantic correlation between information of different modalities can vary noticeably, too. Moreover, cross-modal interrelations are often hard to describe in a precise way. The variety of possible interrelations of textual and graphical information and the question, how they can be described and automatically estimated have not been addressed yet by previous work. In this paper, we present several contributions to close this gap. First, we introduce two measures to describe crossmodal interrelations: cross-modal mutual information (CMI) and semantic correlation (SC). Second, a novel approach relying on deep learning is suggested to estimate CMI and SC of textual and visual information. Third, three diverse datasets are leveraged to learn an appropriate deep neural network model for the demanding task. The system has been evaluated on a challenging test set and the experimental results demonstrate the feasibility of the approach.

UR - http://www.scopus.com/inward/record.url?scp=85021839630&partnerID=8YFLogxK

U2 - 10.1145/3078971.3078991

DO - 10.1145/3078971.3078991

M3 - Conference contribution

AN - SCOPUS:85021839630

T3 - ICMR 2017 - Proceedings of the 2017 ACM International Conference on Multimedia Retrieval

SP - 14

EP - 22

BT - ICMR 2017

T2 - 17th ACM International Conference on Multimedia Retrieval, ICMR 2017

Y2 - 6 June 2017 through 9 June 2017

ER -