Details
Original language | English |
---|---|
Title of host publication | Artificial Neural Networks and Machine Learning – ICANN 2021 - 30th International Conference on Artificial Neural Networks, Proceedings |
Editors | Igor Farkaš, Paolo Masulli, Sebastian Otte, Stefan Wermter |
Publisher | Springer Science and Business Media Deutschland GmbH |
Pages | 555-568 |
Number of pages | 14 |
ISBN (electronic) | 978-3-030-86362-3 |
ISBN (print) | 9783030863616 |
Publication status | Published - 2021 |
Externally published | Yes |
Event | 30th International Conference on Artificial Neural Networks, ICANN 2021 - Virtual, Online Duration: 14 Sept 2021 → 17 Sept 2021 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 12891 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (electronic) | 1611-3349 |
Abstract
While image understanding on recognition-level has achieved remarkable advancements, reliable visual scene understanding requires comprehensive image understanding on recognition-level but also cognition-level, which calls for exploiting the multi-source information as well as learning different levels of understanding and extensive commonsense knowledge. In this paper, we propose a novel Cognitive Attention Network (CAN) for visual commonsense reasoning to achieve interpretable visual understanding. Specifically, we first introduce an image-text fusion module to fuse information from images and text collectively. Second, a novel inference module is designed to encode commonsense among image, query and response. Extensive experiments on large-scale Visual Commonsense Reasoning (VCR) benchmark dataset demonstrate the effectiveness of our approach. The implementation is publicly available at https://github.com/tanjatang/CAN.
ASJC Scopus subject areas
- Mathematics(all)
- Theoretical Computer Science
- Computer Science(all)
- General Computer Science
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
Artificial Neural Networks and Machine Learning – ICANN 2021 - 30th International Conference on Artificial Neural Networks, Proceedings. ed. / Igor Farkaš; Paolo Masulli; Sebastian Otte; Stefan Wermter. Springer Science and Business Media Deutschland GmbH, 2021. p. 555-568 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 12891 LNCS).
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - Interpretable Visual Understanding with Cognitive Attention Network
AU - Tang, Xuejiao
AU - Zhang, Wenbin
AU - Yu, Yi
AU - Turner, Kea
AU - Derr, Tyler
AU - Wang, Mengyu
AU - Ntoutsi, Eirini
PY - 2021
Y1 - 2021
N2 - While image understanding on recognition-level has achieved remarkable advancements, reliable visual scene understanding requires comprehensive image understanding on recognition-level but also cognition-level, which calls for exploiting the multi-source information as well as learning different levels of understanding and extensive commonsense knowledge. In this paper, we propose a novel Cognitive Attention Network (CAN) for visual commonsense reasoning to achieve interpretable visual understanding. Specifically, we first introduce an image-text fusion module to fuse information from images and text collectively. Second, a novel inference module is designed to encode commonsense among image, query and response. Extensive experiments on large-scale Visual Commonsense Reasoning (VCR) benchmark dataset demonstrate the effectiveness of our approach. The implementation is publicly available at https://github.com/tanjatang/CAN.
AB - While image understanding on recognition-level has achieved remarkable advancements, reliable visual scene understanding requires comprehensive image understanding on recognition-level but also cognition-level, which calls for exploiting the multi-source information as well as learning different levels of understanding and extensive commonsense knowledge. In this paper, we propose a novel Cognitive Attention Network (CAN) for visual commonsense reasoning to achieve interpretable visual understanding. Specifically, we first introduce an image-text fusion module to fuse information from images and text collectively. Second, a novel inference module is designed to encode commonsense among image, query and response. Extensive experiments on large-scale Visual Commonsense Reasoning (VCR) benchmark dataset demonstrate the effectiveness of our approach. The implementation is publicly available at https://github.com/tanjatang/CAN.
UR - http://www.scopus.com/inward/record.url?scp=85115445553&partnerID=8YFLogxK
U2 - 10.48550/arXiv.2108.02924
DO - 10.48550/arXiv.2108.02924
M3 - Conference contribution
AN - SCOPUS:85115445553
SN - 9783030863616
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 555
EP - 568
BT - Artificial Neural Networks and Machine Learning – ICANN 2021 - 30th International Conference on Artificial Neural Networks, Proceedings
A2 - Farkaš, Igor
A2 - Masulli, Paolo
A2 - Otte, Sebastian
A2 - Wermter, Stefan
PB - Springer Science and Business Media Deutschland GmbH
T2 - 30th International Conference on Artificial Neural Networks, ICANN 2021
Y2 - 14 September 2021 through 17 September 2021
ER -