A Multi-level Mesh Mutual Attention Model for Visual Question Answering
Abstract Visual question answering is a complex multimodal task involving images and text, with broad application prospects in human–computer interaction and medical assistance. Therefore, how to deal with the feature interaction and multimodal feature fusion between the critical regions in the imag...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SpringerOpen
2022-10-01
|
Series: | Data Science and Engineering |
Subjects: | |
Online Access: | https://doi.org/10.1007/s41019-022-00200-9 |