Semi-Supervised Implicit Augmentation for Data-Scarce VQA

Vision-language models (VLMs) have demonstrated increasing potency in solving complex vision-language tasks in the recent past. Visual question answering (VQA) is one of the primary downstream tasks for assessing the capability of VLMs, as it helps in gauging the multimodal understanding of a VLM in...

Full description

Bibliographic Details
Main Authors: Bhargav Dodla, Kartik Hegde, A. N. Rajagopalan
Format: Article
Language:English
Published: MDPI AG 2024-02-01
Series:Computer Sciences & Mathematics Forum
Subjects:
Online Access:https://www.mdpi.com/2813-0324/9/1/3