Semi-Supervised Implicit Augmentation for Data-Scarce VQA
Vision-language models (VLMs) have demonstrated increasing potency in solving complex vision-language tasks in the recent past. Visual question answering (VQA) is one of the primary downstream tasks for assessing the capability of VLMs, as it helps in gauging the multimodal understanding of a VLM in...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2024-02-01
|
Series: | Computer Sciences & Mathematics Forum |
Subjects: | |
Online Access: | https://www.mdpi.com/2813-0324/9/1/3 |