Details
Originalsprache | Englisch |
---|---|
Titel des Sammelwerks | Artificial Neural Networks and Machine Learning – ICANN 2021 - 30th International Conference on Artificial Neural Networks, Proceedings |
Herausgeber/-innen | Igor Farkaš, Paolo Masulli, Sebastian Otte, Stefan Wermter |
Herausgeber (Verlag) | Springer Science and Business Media Deutschland GmbH |
Seiten | 555-568 |
Seitenumfang | 14 |
ISBN (elektronisch) | 978-3-030-86362-3 |
ISBN (Print) | 9783030863616 |
Publikationsstatus | Veröffentlicht - 2021 |
Extern publiziert | Ja |
Veranstaltung | 30th International Conference on Artificial Neural Networks, ICANN 2021 - Virtual, Online Dauer: 14 Sept. 2021 → 17 Sept. 2021 |
Publikationsreihe
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Band | 12891 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (elektronisch) | 1611-3349 |
Abstract
While image understanding on recognition-level has achieved remarkable advancements, reliable visual scene understanding requires comprehensive image understanding on recognition-level but also cognition-level, which calls for exploiting the multi-source information as well as learning different levels of understanding and extensive commonsense knowledge. In this paper, we propose a novel Cognitive Attention Network (CAN) for visual commonsense reasoning to achieve interpretable visual understanding. Specifically, we first introduce an image-text fusion module to fuse information from images and text collectively. Second, a novel inference module is designed to encode commonsense among image, query and response. Extensive experiments on large-scale Visual Commonsense Reasoning (VCR) benchmark dataset demonstrate the effectiveness of our approach. The implementation is publicly available at https://github.com/tanjatang/CAN.
ASJC Scopus Sachgebiete
- Mathematik (insg.)
- Theoretische Informatik
- Informatik (insg.)
- Allgemeine Computerwissenschaft
Zitieren
- Standard
- Harvard
- Apa
- Vancouver
- BibTex
- RIS
Artificial Neural Networks and Machine Learning – ICANN 2021 - 30th International Conference on Artificial Neural Networks, Proceedings. Hrsg. / Igor Farkaš; Paolo Masulli; Sebastian Otte; Stefan Wermter. Springer Science and Business Media Deutschland GmbH, 2021. S. 555-568 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 12891 LNCS).
Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review
}
TY - GEN
T1 - Interpretable Visual Understanding with Cognitive Attention Network
AU - Tang, Xuejiao
AU - Zhang, Wenbin
AU - Yu, Yi
AU - Turner, Kea
AU - Derr, Tyler
AU - Wang, Mengyu
AU - Ntoutsi, Eirini
PY - 2021
Y1 - 2021
N2 - While image understanding on recognition-level has achieved remarkable advancements, reliable visual scene understanding requires comprehensive image understanding on recognition-level but also cognition-level, which calls for exploiting the multi-source information as well as learning different levels of understanding and extensive commonsense knowledge. In this paper, we propose a novel Cognitive Attention Network (CAN) for visual commonsense reasoning to achieve interpretable visual understanding. Specifically, we first introduce an image-text fusion module to fuse information from images and text collectively. Second, a novel inference module is designed to encode commonsense among image, query and response. Extensive experiments on large-scale Visual Commonsense Reasoning (VCR) benchmark dataset demonstrate the effectiveness of our approach. The implementation is publicly available at https://github.com/tanjatang/CAN.
AB - While image understanding on recognition-level has achieved remarkable advancements, reliable visual scene understanding requires comprehensive image understanding on recognition-level but also cognition-level, which calls for exploiting the multi-source information as well as learning different levels of understanding and extensive commonsense knowledge. In this paper, we propose a novel Cognitive Attention Network (CAN) for visual commonsense reasoning to achieve interpretable visual understanding. Specifically, we first introduce an image-text fusion module to fuse information from images and text collectively. Second, a novel inference module is designed to encode commonsense among image, query and response. Extensive experiments on large-scale Visual Commonsense Reasoning (VCR) benchmark dataset demonstrate the effectiveness of our approach. The implementation is publicly available at https://github.com/tanjatang/CAN.
UR - http://www.scopus.com/inward/record.url?scp=85115445553&partnerID=8YFLogxK
U2 - 10.48550/arXiv.2108.02924
DO - 10.48550/arXiv.2108.02924
M3 - Conference contribution
AN - SCOPUS:85115445553
SN - 9783030863616
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 555
EP - 568
BT - Artificial Neural Networks and Machine Learning – ICANN 2021 - 30th International Conference on Artificial Neural Networks, Proceedings
A2 - Farkaš, Igor
A2 - Masulli, Paolo
A2 - Otte, Sebastian
A2 - Wermter, Stefan
PB - Springer Science and Business Media Deutschland GmbH
T2 - 30th International Conference on Artificial Neural Networks, ICANN 2021
Y2 - 14 September 2021 through 17 September 2021
ER -