Interpretable Visual Understanding with Cognitive Attention Network

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

  • Xuejiao Tang
  • Wenbin Zhang
  • Yi Yu
  • Kea Turner
  • Tyler Derr
  • Mengyu Wang
  • Eirini Ntoutsi

External Research Organisations

  • Carnegie Mellon University
  • Research Organization of Information and Systems National Institute of Informatics
  • University of South Florida
  • Vanderbilt University
  • Harvard University
  • Freie Universität Berlin (FU Berlin)
View graph of relations

Details

Original languageEnglish
Title of host publicationArtificial Neural Networks and Machine Learning – ICANN 2021 - 30th International Conference on Artificial Neural Networks, Proceedings
EditorsIgor Farkaš, Paolo Masulli, Sebastian Otte, Stefan Wermter
PublisherSpringer Science and Business Media Deutschland GmbH
Pages555-568
Number of pages14
ISBN (electronic)978-3-030-86362-3
ISBN (print)9783030863616
Publication statusPublished - 2021
Externally publishedYes
Event30th International Conference on Artificial Neural Networks, ICANN 2021 - Virtual, Online
Duration: 14 Sept 202117 Sept 2021

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12891 LNCS
ISSN (Print)0302-9743
ISSN (electronic)1611-3349

Abstract

While image understanding on recognition-level has achieved remarkable advancements, reliable visual scene understanding requires comprehensive image understanding on recognition-level but also cognition-level, which calls for exploiting the multi-source information as well as learning different levels of understanding and extensive commonsense knowledge. In this paper, we propose a novel Cognitive Attention Network (CAN) for visual commonsense reasoning to achieve interpretable visual understanding. Specifically, we first introduce an image-text fusion module to fuse information from images and text collectively. Second, a novel inference module is designed to encode commonsense among image, query and response. Extensive experiments on large-scale Visual Commonsense Reasoning (VCR) benchmark dataset demonstrate the effectiveness of our approach. The implementation is publicly available at https://github.com/tanjatang/CAN.

ASJC Scopus subject areas

Cite this

Interpretable Visual Understanding with Cognitive Attention Network. / Tang, Xuejiao; Zhang, Wenbin; Yu, Yi et al.
Artificial Neural Networks and Machine Learning – ICANN 2021 - 30th International Conference on Artificial Neural Networks, Proceedings. ed. / Igor Farkaš; Paolo Masulli; Sebastian Otte; Stefan Wermter. Springer Science and Business Media Deutschland GmbH, 2021. p. 555-568 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 12891 LNCS).

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Tang, X, Zhang, W, Yu, Y, Turner, K, Derr, T, Wang, M & Ntoutsi, E 2021, Interpretable Visual Understanding with Cognitive Attention Network. in I Farkaš, P Masulli, S Otte & S Wermter (eds), Artificial Neural Networks and Machine Learning – ICANN 2021 - 30th International Conference on Artificial Neural Networks, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12891 LNCS, Springer Science and Business Media Deutschland GmbH, pp. 555-568, 30th International Conference on Artificial Neural Networks, ICANN 2021, Virtual, Online, 14 Sept 2021. https://doi.org/10.48550/arXiv.2108.02924, https://doi.org/10.1007/978-3-030-86362-3_45
Tang, X., Zhang, W., Yu, Y., Turner, K., Derr, T., Wang, M., & Ntoutsi, E. (2021). Interpretable Visual Understanding with Cognitive Attention Network. In I. Farkaš, P. Masulli, S. Otte, & S. Wermter (Eds.), Artificial Neural Networks and Machine Learning – ICANN 2021 - 30th International Conference on Artificial Neural Networks, Proceedings (pp. 555-568). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 12891 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.48550/arXiv.2108.02924, https://doi.org/10.1007/978-3-030-86362-3_45
Tang X, Zhang W, Yu Y, Turner K, Derr T, Wang M et al. Interpretable Visual Understanding with Cognitive Attention Network. In Farkaš I, Masulli P, Otte S, Wermter S, editors, Artificial Neural Networks and Machine Learning – ICANN 2021 - 30th International Conference on Artificial Neural Networks, Proceedings. Springer Science and Business Media Deutschland GmbH. 2021. p. 555-568. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). Epub 2021 Sept 7. doi: 10.48550/arXiv.2108.02924, 10.1007/978-3-030-86362-3_45
Tang, Xuejiao ; Zhang, Wenbin ; Yu, Yi et al. / Interpretable Visual Understanding with Cognitive Attention Network. Artificial Neural Networks and Machine Learning – ICANN 2021 - 30th International Conference on Artificial Neural Networks, Proceedings. editor / Igor Farkaš ; Paolo Masulli ; Sebastian Otte ; Stefan Wermter. Springer Science and Business Media Deutschland GmbH, 2021. pp. 555-568 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Download
@inproceedings{874ac6938dc24e80a92b2bdcef552558,
title = "Interpretable Visual Understanding with Cognitive Attention Network",
abstract = "While image understanding on recognition-level has achieved remarkable advancements, reliable visual scene understanding requires comprehensive image understanding on recognition-level but also cognition-level, which calls for exploiting the multi-source information as well as learning different levels of understanding and extensive commonsense knowledge. In this paper, we propose a novel Cognitive Attention Network (CAN) for visual commonsense reasoning to achieve interpretable visual understanding. Specifically, we first introduce an image-text fusion module to fuse information from images and text collectively. Second, a novel inference module is designed to encode commonsense among image, query and response. Extensive experiments on large-scale Visual Commonsense Reasoning (VCR) benchmark dataset demonstrate the effectiveness of our approach. The implementation is publicly available at https://github.com/tanjatang/CAN.",
author = "Xuejiao Tang and Wenbin Zhang and Yi Yu and Kea Turner and Tyler Derr and Mengyu Wang and Eirini Ntoutsi",
year = "2021",
doi = "10.48550/arXiv.2108.02924",
language = "English",
isbn = "9783030863616",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Science and Business Media Deutschland GmbH",
pages = "555--568",
editor = "Igor Farka{\v s} and Paolo Masulli and Sebastian Otte and Stefan Wermter",
booktitle = "Artificial Neural Networks and Machine Learning – ICANN 2021 - 30th International Conference on Artificial Neural Networks, Proceedings",
address = "Germany",
note = "30th International Conference on Artificial Neural Networks, ICANN 2021 ; Conference date: 14-09-2021 Through 17-09-2021",

}

Download

TY - GEN

T1 - Interpretable Visual Understanding with Cognitive Attention Network

AU - Tang, Xuejiao

AU - Zhang, Wenbin

AU - Yu, Yi

AU - Turner, Kea

AU - Derr, Tyler

AU - Wang, Mengyu

AU - Ntoutsi, Eirini

PY - 2021

Y1 - 2021

N2 - While image understanding on recognition-level has achieved remarkable advancements, reliable visual scene understanding requires comprehensive image understanding on recognition-level but also cognition-level, which calls for exploiting the multi-source information as well as learning different levels of understanding and extensive commonsense knowledge. In this paper, we propose a novel Cognitive Attention Network (CAN) for visual commonsense reasoning to achieve interpretable visual understanding. Specifically, we first introduce an image-text fusion module to fuse information from images and text collectively. Second, a novel inference module is designed to encode commonsense among image, query and response. Extensive experiments on large-scale Visual Commonsense Reasoning (VCR) benchmark dataset demonstrate the effectiveness of our approach. The implementation is publicly available at https://github.com/tanjatang/CAN.

AB - While image understanding on recognition-level has achieved remarkable advancements, reliable visual scene understanding requires comprehensive image understanding on recognition-level but also cognition-level, which calls for exploiting the multi-source information as well as learning different levels of understanding and extensive commonsense knowledge. In this paper, we propose a novel Cognitive Attention Network (CAN) for visual commonsense reasoning to achieve interpretable visual understanding. Specifically, we first introduce an image-text fusion module to fuse information from images and text collectively. Second, a novel inference module is designed to encode commonsense among image, query and response. Extensive experiments on large-scale Visual Commonsense Reasoning (VCR) benchmark dataset demonstrate the effectiveness of our approach. The implementation is publicly available at https://github.com/tanjatang/CAN.

UR - http://www.scopus.com/inward/record.url?scp=85115445553&partnerID=8YFLogxK

U2 - 10.48550/arXiv.2108.02924

DO - 10.48550/arXiv.2108.02924

M3 - Conference contribution

AN - SCOPUS:85115445553

SN - 9783030863616

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 555

EP - 568

BT - Artificial Neural Networks and Machine Learning – ICANN 2021 - 30th International Conference on Artificial Neural Networks, Proceedings

A2 - Farkaš, Igor

A2 - Masulli, Paolo

A2 - Otte, Sebastian

A2 - Wermter, Stefan

PB - Springer Science and Business Media Deutschland GmbH

T2 - 30th International Conference on Artificial Neural Networks, ICANN 2021

Y2 - 14 September 2021 through 17 September 2021

ER -