Unified Transformer with Cross-Modal Mixture Experts for Remote-Sensing Visual Question Answering

Remote-sensing visual question answering (RSVQA) aims to provide accurate answers to remote sensing images and their associated questions by leveraging both visual and textual information during the inference process. However, most existing methods ignore the significance of the interaction between...

Full description

Bibliographic Details
Main Authors:	Gang Liu, Jinlong He, Pengfei Li, Shenjun Zhong, Hongyang Li, Genrong He
Format:	Article
Language:	English
Published:	MDPI AG 2023-09-01
Series:	Remote Sensing
Subjects:	remote-sensing visual question answering cross-modal mixture experts cross-modal attention transformer vision transformer BERT
Online Access:	https://www.mdpi.com/2072-4292/15/19/4682

Internet

https://www.mdpi.com/2072-4292/15/19/4682

Unified Transformer with Cross-Modal Mixture Experts for Remote-Sensing Visual Question Answering

Internet

Similar Items