Interpretable Visual Understanding with Cognitive Attention Network

Xuejiao Tang; Wenbin Zhang; Yi Yu; Kea Turner; Tyler Derr; Mengyu Wang; Eirini Ntoutsi

doi:10.48550/arXiv.2108.02924

Details

Originalsprache	Englisch
Titel des Sammelwerks	Artificial Neural Networks and Machine Learning – ICANN 2021 - 30th International Conference on Artificial Neural Networks, Proceedings
Herausgeber/-innen	Igor Farkaš, Paolo Masulli, Sebastian Otte, Stefan Wermter
Herausgeber (Verlag)	Springer Science and Business Media Deutschland GmbH
Seiten	555-568
Seitenumfang	14
ISBN (elektronisch)	978-3-030-86362-3
ISBN (Print)	9783030863616
Publikationsstatus	Veröffentlicht - 2021
Extern publiziert	Ja
Veranstaltung	30th International Conference on Artificial Neural Networks, ICANN 2021 - Virtual, Online Dauer: 14 Sept. 2021 → 17 Sept. 2021

Publikationsreihe

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Band	12891 LNCS
ISSN (Print)	0302-9743
ISSN (elektronisch)	1611-3349

Abstract

While image understanding on recognition-level has achieved remarkable advancements, reliable visual scene understanding requires comprehensive image understanding on recognition-level but also cognition-level, which calls for exploiting the multi-source information as well as learning different levels of understanding and extensive commonsense knowledge. In this paper, we propose a novel Cognitive Attention Network (CAN) for visual commonsense reasoning to achieve interpretable visual understanding. Specifically, we first introduce an image-text fusion module to fuse information from images and text collectively. Second, a novel inference module is designed to encode commonsense among image, query and response. Extensive experiments on large-scale Visual Commonsense Reasoning (VCR) benchmark dataset demonstrate the effectiveness of our approach. The implementation is publicly available at https://github.com/tanjatang/CAN.

ASJC Scopus Sachgebiete

Mathematik (insg.)
Theoretische Informatik
Informatik (insg.)
Allgemeine Computerwissenschaft

Zitieren

Interpretable Visual Understanding with Cognitive Attention Network. / Tang, Xuejiao; Zhang, Wenbin; Yu, Yi et al.
Artificial Neural Networks and Machine Learning – ICANN 2021 - 30th International Conference on Artificial Neural Networks, Proceedings. Hrsg. / Igor Farkaš; Paolo Masulli; Sebastian Otte; Stefan Wermter. Springer Science and Business Media Deutschland GmbH, 2021. S. 555-568 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 12891 LNCS).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review

Tang, X, Zhang, W, Yu, Y, Turner, K, Derr, T, Wang, M & Ntoutsi, E 2021, Interpretable Visual Understanding with Cognitive Attention Network. in I Farkaš, P Masulli, S Otte & S Wermter (Hrsg.), Artificial Neural Networks and Machine Learning – ICANN 2021 - 30th International Conference on Artificial Neural Networks, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Bd. 12891 LNCS, Springer Science and Business Media Deutschland GmbH, S. 555-568, 30th International Conference on Artificial Neural Networks, ICANN 2021, Virtual, Online, 14 Sept. 2021. https://doi.org/10.48550/arXiv.2108.02924, https://doi.org/10.1007/978-3-030-86362-3_45

Tang, X., Zhang, W., Yu, Y., Turner, K., Derr, T., Wang, M., & Ntoutsi, E. (2021). Interpretable Visual Understanding with Cognitive Attention Network. In I. Farkaš, P. Masulli, S. Otte, & S. Wermter (Hrsg.), Artificial Neural Networks and Machine Learning – ICANN 2021 - 30th International Conference on Artificial Neural Networks, Proceedings (S. 555-568). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 12891 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.48550/arXiv.2108.02924, https://doi.org/10.1007/978-3-030-86362-3_45

Tang X, Zhang W, Yu Y, Turner K, Derr T, Wang M et al. Interpretable Visual Understanding with Cognitive Attention Network. in Farkaš I, Masulli P, Otte S, Wermter S, Hrsg., Artificial Neural Networks and Machine Learning – ICANN 2021 - 30th International Conference on Artificial Neural Networks, Proceedings. Springer Science and Business Media Deutschland GmbH. 2021. S. 555-568. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). Epub 2021 Sep 7. doi: 10.48550/arXiv.2108.02924, 10.1007/978-3-030-86362-3_45

Tang, Xuejiao ; Zhang, Wenbin ; Yu, Yi et al. / Interpretable Visual Understanding with Cognitive Attention Network. Artificial Neural Networks and Machine Learning – ICANN 2021 - 30th International Conference on Artificial Neural Networks, Proceedings. Hrsg. / Igor Farkaš ; Paolo Masulli ; Sebastian Otte ; Stefan Wermter. Springer Science and Business Media Deutschland GmbH, 2021. S. 555-568 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

Download

@inproceedings{874ac6938dc24e80a92b2bdcef552558,

title = "Interpretable Visual Understanding with Cognitive Attention Network",

abstract = "While image understanding on recognition-level has achieved remarkable advancements, reliable visual scene understanding requires comprehensive image understanding on recognition-level but also cognition-level, which calls for exploiting the multi-source information as well as learning different levels of understanding and extensive commonsense knowledge. In this paper, we propose a novel Cognitive Attention Network (CAN) for visual commonsense reasoning to achieve interpretable visual understanding. Specifically, we first introduce an image-text fusion module to fuse information from images and text collectively. Second, a novel inference module is designed to encode commonsense among image, query and response. Extensive experiments on large-scale Visual Commonsense Reasoning (VCR) benchmark dataset demonstrate the effectiveness of our approach. The implementation is publicly available at https://github.com/tanjatang/CAN.",

author = "Xuejiao Tang and Wenbin Zhang and Yi Yu and Kea Turner and Tyler Derr and Mengyu Wang and Eirini Ntoutsi",

year = "2021",

doi = "10.48550/arXiv.2108.02924",

language = "English",

isbn = "9783030863616",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "555--568",

editor = "Igor Farka{\v s} and Paolo Masulli and Sebastian Otte and Stefan Wermter",

booktitle = "Artificial Neural Networks and Machine Learning – ICANN 2021 - 30th International Conference on Artificial Neural Networks, Proceedings",

address = "Germany",

note = "30th International Conference on Artificial Neural Networks, ICANN 2021 ; Conference date: 14-09-2021 Through 17-09-2021",

}

Download

TY - GEN

T1 - Interpretable Visual Understanding with Cognitive Attention Network

AU - Tang, Xuejiao

AU - Zhang, Wenbin

AU - Yu, Yi

AU - Turner, Kea

AU - Derr, Tyler

AU - Wang, Mengyu

AU - Ntoutsi, Eirini

PY - 2021

Y1 - 2021

N2 - While image understanding on recognition-level has achieved remarkable advancements, reliable visual scene understanding requires comprehensive image understanding on recognition-level but also cognition-level, which calls for exploiting the multi-source information as well as learning different levels of understanding and extensive commonsense knowledge. In this paper, we propose a novel Cognitive Attention Network (CAN) for visual commonsense reasoning to achieve interpretable visual understanding. Specifically, we first introduce an image-text fusion module to fuse information from images and text collectively. Second, a novel inference module is designed to encode commonsense among image, query and response. Extensive experiments on large-scale Visual Commonsense Reasoning (VCR) benchmark dataset demonstrate the effectiveness of our approach. The implementation is publicly available at https://github.com/tanjatang/CAN.

AB - While image understanding on recognition-level has achieved remarkable advancements, reliable visual scene understanding requires comprehensive image understanding on recognition-level but also cognition-level, which calls for exploiting the multi-source information as well as learning different levels of understanding and extensive commonsense knowledge. In this paper, we propose a novel Cognitive Attention Network (CAN) for visual commonsense reasoning to achieve interpretable visual understanding. Specifically, we first introduce an image-text fusion module to fuse information from images and text collectively. Second, a novel inference module is designed to encode commonsense among image, query and response. Extensive experiments on large-scale Visual Commonsense Reasoning (VCR) benchmark dataset demonstrate the effectiveness of our approach. The implementation is publicly available at https://github.com/tanjatang/CAN.

UR - http://www.scopus.com/inward/record.url?scp=85115445553&partnerID=8YFLogxK

U2 - 10.48550/arXiv.2108.02924

DO - 10.48550/arXiv.2108.02924

M3 - Conference contribution

AN - SCOPUS:85115445553

SN - 9783030863616

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 555

EP - 568

BT - Artificial Neural Networks and Machine Learning – ICANN 2021 - 30th International Conference on Artificial Neural Networks, Proceedings

A2 - Farkaš, Igor

A2 - Masulli, Paolo

A2 - Otte, Sebastian

A2 - Wermter, Stefan

PB - Springer Science and Business Media Deutschland GmbH

T2 - 30th International Conference on Artificial Neural Networks, ICANN 2021

Y2 - 14 September 2021 through 17 September 2021

ER -

Research@Leibniz University

Interpretable Visual Understanding with Cognitive Attention Network

Autorschaft

Externe Organisationen

Details

Publikationsreihe

Abstract

ASJC Scopus Sachgebiete

Zitieren