Transformer Module Networks for Systematic Generalization in Visual Question Answering
Transformer-based models achieve great performance on Visual Question Answering (VQA). How- ever, when we evaluate them on systematic generalization, i.e., handling novel combinations of known concepts, their performance degrades. Neural Module Networks (NMNs) are a promising approach for systematic...
Main Authors: | Yamada, Moyuru, D'Amario, Vanessa, Takemoto, Kentaro, Boix, Xavier, Sasaki, Tomotake |
---|---|
Format: | Article |
Published: |
Center for Brains, Minds and Machines (CBMM)
2022
|
Online Access: | https://hdl.handle.net/1721.1/139843 |
Similar Items
-
The Data Efficiency of Deep Learning Is Degraded by Unnecessary Input Dimensions
by: D'Amario, Vanessa, et al.
Published: (2022) -
The Data Efficiency of Deep Learning Is Degraded by Unnecessary Input Dimensions
by: D'Amario, Vanessa, et al.
Published: (2022) -
The Data Efficiency of Deep Learning Is Degraded by Unnecessary Input Dimensions
by: D'Amario, Vanessa, et al.
Published: (2022) -
Visual questioning and answering
by: Ong, Zavier Jian Le
Published: (2024) -
Review of Visual Question Answering Technology
by: WANG Yu, SUN Haichun
Published: (2023-07-01)