Details
Original language | English |
---|---|
Title of host publication | ICMR 2017 |
Subtitle of host publication | Proceedings of the 2017 ACM International Conference on Multimedia Retrieval |
Pages | 14-22 |
Number of pages | 9 |
ISBN (electronic) | 9781450347013 |
Publication status | Published - 6 Jun 2017 |
Event | 17th ACM International Conference on Multimedia Retrieval, ICMR 2017 - Bucharest, Romania Duration: 6 Jun 2017 → 9 Jun 2017 |
Publication series
Name | ICMR 2017 - Proceedings of the 2017 ACM International Conference on Multimedia Retrieval |
---|
Abstract
Photos, drawings, figures, etc. supplement textual information in various kinds of media, for example, in web news or scientific publications. In this respect, the intended effect of an image can be quite different, e.g., providing additional information, focusing on certain details of surrounding text, or simply being a general illustration of a topic. As a consequence, the semantic correlation between information of different modalities can vary noticeably, too. Moreover, cross-modal interrelations are often hard to describe in a precise way. The variety of possible interrelations of textual and graphical information and the question, how they can be described and automatically estimated have not been addressed yet by previous work. In this paper, we present several contributions to close this gap. First, we introduce two measures to describe crossmodal interrelations: cross-modal mutual information (CMI) and semantic correlation (SC). Second, a novel approach relying on deep learning is suggested to estimate CMI and SC of textual and visual information. Third, three diverse datasets are leveraged to learn an appropriate deep neural network model for the demanding task. The system has been evaluated on a challenging test set and the experimental results demonstrate the feasibility of the approach.
ASJC Scopus subject areas
- Computer Science(all)
- Human-Computer Interaction
- Computer Science(all)
- Software
- Computer Science(all)
- Computer Graphics and Computer-Aided Design
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
ICMR 2017 : Proceedings of the 2017 ACM International Conference on Multimedia Retrieval. 2017. p. 14-22 (ICMR 2017 - Proceedings of the 2017 ACM International Conference on Multimedia Retrieval).
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - Estimating the Information Gap between Textual and Visual Representations
AU - Henning, Christian
AU - Ewerth, Ralph
PY - 2017/6/6
Y1 - 2017/6/6
N2 - Photos, drawings, figures, etc. supplement textual information in various kinds of media, for example, in web news or scientific publications. In this respect, the intended effect of an image can be quite different, e.g., providing additional information, focusing on certain details of surrounding text, or simply being a general illustration of a topic. As a consequence, the semantic correlation between information of different modalities can vary noticeably, too. Moreover, cross-modal interrelations are often hard to describe in a precise way. The variety of possible interrelations of textual and graphical information and the question, how they can be described and automatically estimated have not been addressed yet by previous work. In this paper, we present several contributions to close this gap. First, we introduce two measures to describe crossmodal interrelations: cross-modal mutual information (CMI) and semantic correlation (SC). Second, a novel approach relying on deep learning is suggested to estimate CMI and SC of textual and visual information. Third, three diverse datasets are leveraged to learn an appropriate deep neural network model for the demanding task. The system has been evaluated on a challenging test set and the experimental results demonstrate the feasibility of the approach.
AB - Photos, drawings, figures, etc. supplement textual information in various kinds of media, for example, in web news or scientific publications. In this respect, the intended effect of an image can be quite different, e.g., providing additional information, focusing on certain details of surrounding text, or simply being a general illustration of a topic. As a consequence, the semantic correlation between information of different modalities can vary noticeably, too. Moreover, cross-modal interrelations are often hard to describe in a precise way. The variety of possible interrelations of textual and graphical information and the question, how they can be described and automatically estimated have not been addressed yet by previous work. In this paper, we present several contributions to close this gap. First, we introduce two measures to describe crossmodal interrelations: cross-modal mutual information (CMI) and semantic correlation (SC). Second, a novel approach relying on deep learning is suggested to estimate CMI and SC of textual and visual information. Third, three diverse datasets are leveraged to learn an appropriate deep neural network model for the demanding task. The system has been evaluated on a challenging test set and the experimental results demonstrate the feasibility of the approach.
UR - http://www.scopus.com/inward/record.url?scp=85021839630&partnerID=8YFLogxK
U2 - 10.1145/3078971.3078991
DO - 10.1145/3078971.3078991
M3 - Conference contribution
AN - SCOPUS:85021839630
T3 - ICMR 2017 - Proceedings of the 2017 ACM International Conference on Multimedia Retrieval
SP - 14
EP - 22
BT - ICMR 2017
T2 - 17th ACM International Conference on Multimedia Retrieval, ICMR 2017
Y2 - 6 June 2017 through 9 June 2017
ER -