MSGeN: Multimodal Selective Generation Network for Grounded Explanations
Modern models have shown impressive capabilities in visual reasoning tasks. However, the interpretability of their decision-making processes remains a challenge, causing uncertainty in their reliability. In response, we present the Multimodal Selective Generation Network (MSGeN), a novel approach to...
Main Authors: | Dingbang Li, Wenzhou Chen, Xin Lin |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-12-01
|
Series: | Electronics |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-9292/13/1/152 |
Similar Items
-
Multimodal Natural Language Explanation Generation for Visual Question Answering Based on Multiple Reference Data
by: He Zhu, et al.
Published: (2023-05-01) -
Survey of Multimodal Medical Question Answering
by: Hilmi Demirhan, et al.
Published: (2023-12-01) -
Interpreting vision and language generative models with semantic visual priors
by: Michele Cafagna, et al.
Published: (2023-09-01) -
SBVQA 2.0: Robust End-to-End Speech-Based Visual Question Answering for Open-Ended Questions
by: Faris Alasmary, et al.
Published: (2023-01-01) -
VL-Meta: Vision-Language Models for Multimodal Meta-Learning
by: Han Ma, et al.
Published: (2024-01-01)