Multi-Modal Alignment of Visual Question Answering Based on Multi-Hop Attention Mechanism

The alignment of information between the image and the question is of great significance in the visual question answering (VQA) task. Self-attention is commonly used to generate attention weights between image and question. These attention weights can align two modalities. Through the attention weig...

Full description

Bibliographic Details
Main Authors: Qihao Xia, Chao Yu, Yinong Hou, Pingping Peng, Zhengqi Zheng, Wen Chen
Format: Article
Language:English
Published: MDPI AG 2022-06-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/11/11/1778