A Multi-level Mesh Mutual Attention Model for Visual Question Answering

Abstract Visual question answering is a complex multimodal task involving images and text, with broad application prospects in human–computer interaction and medical assistance. Therefore, how to deal with the feature interaction and multimodal feature fusion between the critical regions in the imag...

Full description

Bibliographic Details
Main Authors: Zhi Lei, Guixian Zhang, Lijuan Wu, Kui Zhang, Rongjiao Liang
Format: Article
Language:English
Published: SpringerOpen 2022-10-01
Series:Data Science and Engineering
Subjects:
Online Access:https://doi.org/10.1007/s41019-022-00200-9