Interpretable Visual Understanding with Cognitive Attention Network

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Autorschaft

  • Xuejiao Tang
  • Wenbin Zhang
  • Yi Yu
  • Kea Turner
  • Tyler Derr
  • Mengyu Wang
  • Eirini Ntoutsi

Externe Organisationen

  • Carnegie Mellon University
  • Research Organization of Information and Systems National Institute of Informatics
  • University of South Florida
  • Vanderbilt University
  • Harvard University
  • Freie Universität Berlin (FU Berlin)
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Titel des SammelwerksArtificial Neural Networks and Machine Learning – ICANN 2021 - 30th International Conference on Artificial Neural Networks, Proceedings
Herausgeber/-innenIgor Farkaš, Paolo Masulli, Sebastian Otte, Stefan Wermter
Herausgeber (Verlag)Springer Science and Business Media Deutschland GmbH
Seiten555-568
Seitenumfang14
ISBN (elektronisch)978-3-030-86362-3
ISBN (Print)9783030863616
PublikationsstatusVeröffentlicht - 2021
Extern publiziertJa
Veranstaltung30th International Conference on Artificial Neural Networks, ICANN 2021 - Virtual, Online
Dauer: 14 Sept. 202117 Sept. 2021

Publikationsreihe

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Band12891 LNCS
ISSN (Print)0302-9743
ISSN (elektronisch)1611-3349

Abstract

While image understanding on recognition-level has achieved remarkable advancements, reliable visual scene understanding requires comprehensive image understanding on recognition-level but also cognition-level, which calls for exploiting the multi-source information as well as learning different levels of understanding and extensive commonsense knowledge. In this paper, we propose a novel Cognitive Attention Network (CAN) for visual commonsense reasoning to achieve interpretable visual understanding. Specifically, we first introduce an image-text fusion module to fuse information from images and text collectively. Second, a novel inference module is designed to encode commonsense among image, query and response. Extensive experiments on large-scale Visual Commonsense Reasoning (VCR) benchmark dataset demonstrate the effectiveness of our approach. The implementation is publicly available at https://github.com/tanjatang/CAN.

ASJC Scopus Sachgebiete

Zitieren

Interpretable Visual Understanding with Cognitive Attention Network. / Tang, Xuejiao; Zhang, Wenbin; Yu, Yi et al.
Artificial Neural Networks and Machine Learning – ICANN 2021 - 30th International Conference on Artificial Neural Networks, Proceedings. Hrsg. / Igor Farkaš; Paolo Masulli; Sebastian Otte; Stefan Wermter. Springer Science and Business Media Deutschland GmbH, 2021. S. 555-568 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 12891 LNCS).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Tang, X, Zhang, W, Yu, Y, Turner, K, Derr, T, Wang, M & Ntoutsi, E 2021, Interpretable Visual Understanding with Cognitive Attention Network. in I Farkaš, P Masulli, S Otte & S Wermter (Hrsg.), Artificial Neural Networks and Machine Learning – ICANN 2021 - 30th International Conference on Artificial Neural Networks, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Bd. 12891 LNCS, Springer Science and Business Media Deutschland GmbH, S. 555-568, 30th International Conference on Artificial Neural Networks, ICANN 2021, Virtual, Online, 14 Sept. 2021. https://doi.org/10.48550/arXiv.2108.02924, https://doi.org/10.1007/978-3-030-86362-3_45
Tang, X., Zhang, W., Yu, Y., Turner, K., Derr, T., Wang, M., & Ntoutsi, E. (2021). Interpretable Visual Understanding with Cognitive Attention Network. In I. Farkaš, P. Masulli, S. Otte, & S. Wermter (Hrsg.), Artificial Neural Networks and Machine Learning – ICANN 2021 - 30th International Conference on Artificial Neural Networks, Proceedings (S. 555-568). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 12891 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.48550/arXiv.2108.02924, https://doi.org/10.1007/978-3-030-86362-3_45
Tang X, Zhang W, Yu Y, Turner K, Derr T, Wang M et al. Interpretable Visual Understanding with Cognitive Attention Network. in Farkaš I, Masulli P, Otte S, Wermter S, Hrsg., Artificial Neural Networks and Machine Learning – ICANN 2021 - 30th International Conference on Artificial Neural Networks, Proceedings. Springer Science and Business Media Deutschland GmbH. 2021. S. 555-568. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). Epub 2021 Sep 7. doi: 10.48550/arXiv.2108.02924, 10.1007/978-3-030-86362-3_45
Tang, Xuejiao ; Zhang, Wenbin ; Yu, Yi et al. / Interpretable Visual Understanding with Cognitive Attention Network. Artificial Neural Networks and Machine Learning – ICANN 2021 - 30th International Conference on Artificial Neural Networks, Proceedings. Hrsg. / Igor Farkaš ; Paolo Masulli ; Sebastian Otte ; Stefan Wermter. Springer Science and Business Media Deutschland GmbH, 2021. S. 555-568 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Download
@inproceedings{874ac6938dc24e80a92b2bdcef552558,
title = "Interpretable Visual Understanding with Cognitive Attention Network",
abstract = "While image understanding on recognition-level has achieved remarkable advancements, reliable visual scene understanding requires comprehensive image understanding on recognition-level but also cognition-level, which calls for exploiting the multi-source information as well as learning different levels of understanding and extensive commonsense knowledge. In this paper, we propose a novel Cognitive Attention Network (CAN) for visual commonsense reasoning to achieve interpretable visual understanding. Specifically, we first introduce an image-text fusion module to fuse information from images and text collectively. Second, a novel inference module is designed to encode commonsense among image, query and response. Extensive experiments on large-scale Visual Commonsense Reasoning (VCR) benchmark dataset demonstrate the effectiveness of our approach. The implementation is publicly available at https://github.com/tanjatang/CAN.",
author = "Xuejiao Tang and Wenbin Zhang and Yi Yu and Kea Turner and Tyler Derr and Mengyu Wang and Eirini Ntoutsi",
year = "2021",
doi = "10.48550/arXiv.2108.02924",
language = "English",
isbn = "9783030863616",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Science and Business Media Deutschland GmbH",
pages = "555--568",
editor = "Igor Farka{\v s} and Paolo Masulli and Sebastian Otte and Stefan Wermter",
booktitle = "Artificial Neural Networks and Machine Learning – ICANN 2021 - 30th International Conference on Artificial Neural Networks, Proceedings",
address = "Germany",
note = "30th International Conference on Artificial Neural Networks, ICANN 2021 ; Conference date: 14-09-2021 Through 17-09-2021",

}

Download

TY - GEN

T1 - Interpretable Visual Understanding with Cognitive Attention Network

AU - Tang, Xuejiao

AU - Zhang, Wenbin

AU - Yu, Yi

AU - Turner, Kea

AU - Derr, Tyler

AU - Wang, Mengyu

AU - Ntoutsi, Eirini

PY - 2021

Y1 - 2021

N2 - While image understanding on recognition-level has achieved remarkable advancements, reliable visual scene understanding requires comprehensive image understanding on recognition-level but also cognition-level, which calls for exploiting the multi-source information as well as learning different levels of understanding and extensive commonsense knowledge. In this paper, we propose a novel Cognitive Attention Network (CAN) for visual commonsense reasoning to achieve interpretable visual understanding. Specifically, we first introduce an image-text fusion module to fuse information from images and text collectively. Second, a novel inference module is designed to encode commonsense among image, query and response. Extensive experiments on large-scale Visual Commonsense Reasoning (VCR) benchmark dataset demonstrate the effectiveness of our approach. The implementation is publicly available at https://github.com/tanjatang/CAN.

AB - While image understanding on recognition-level has achieved remarkable advancements, reliable visual scene understanding requires comprehensive image understanding on recognition-level but also cognition-level, which calls for exploiting the multi-source information as well as learning different levels of understanding and extensive commonsense knowledge. In this paper, we propose a novel Cognitive Attention Network (CAN) for visual commonsense reasoning to achieve interpretable visual understanding. Specifically, we first introduce an image-text fusion module to fuse information from images and text collectively. Second, a novel inference module is designed to encode commonsense among image, query and response. Extensive experiments on large-scale Visual Commonsense Reasoning (VCR) benchmark dataset demonstrate the effectiveness of our approach. The implementation is publicly available at https://github.com/tanjatang/CAN.

UR - http://www.scopus.com/inward/record.url?scp=85115445553&partnerID=8YFLogxK

U2 - 10.48550/arXiv.2108.02924

DO - 10.48550/arXiv.2108.02924

M3 - Conference contribution

AN - SCOPUS:85115445553

SN - 9783030863616

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 555

EP - 568

BT - Artificial Neural Networks and Machine Learning – ICANN 2021 - 30th International Conference on Artificial Neural Networks, Proceedings

A2 - Farkaš, Igor

A2 - Masulli, Paolo

A2 - Otte, Sebastian

A2 - Wermter, Stefan

PB - Springer Science and Business Media Deutschland GmbH

T2 - 30th International Conference on Artificial Neural Networks, ICANN 2021

Y2 - 14 September 2021 through 17 September 2021

ER -