Contrastive training of a multimodal encoder for medical visual question answering
Models for Visual Question Answering (VQA) on medical images aim to answer diagnostically relevant natural language questions with basis on visual contents. In this article, we propose a novel approach to address this problem, which combines a strong image encoder based on EfficientNetV2 with a mult...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2023-05-01
|
Series: | Intelligent Systems with Applications |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2667305323000467 |