Contrastive training of a multimodal encoder for medical visual question answering

Models for Visual Question Answering (VQA) on medical images aim to answer diagnostically relevant natural language questions with basis on visual contents. In this article, we propose a novel approach to address this problem, which combines a strong image encoder based on EfficientNetV2 with a mult...

Full description

Bibliographic Details
Main Authors: João Daniel Silva, Bruno Martins, João Magalhães
Format: Article
Language:English
Published: Elsevier 2023-05-01
Series:Intelligent Systems with Applications
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2667305323000467