Augmenting Inputs using a Novel Figure-to-Text Pipeline to Assist Visual Language Models in Answering Scientific Domain Queries
Recent advancements in visual language models (VLMs) have transformed the way we interpret and interact with digital imagery, bridging the gap between visual and textual data. However, these models, like Bard, GPT4-v, and LLava, often struggle with specialized fields, particularly when processing sc...
Main Author: | Gupta, Sejal |
---|---|
Other Authors: | Cafarella, Michael |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2024
|
Online Access: | https://hdl.handle.net/1721.1/156824 |
Similar Items
-
FigurA11y: AI Assistance for Writing Scientific Alt Text
by: Singh, Nikhil, et al.
Published: (2024) -
Query expansion techniques for question answering
by: Bilotti, Matthew W. (Matthew William), 1981-
Published: (2005) -
Augmenting Transformers for Open Domain Procedural Text Comprehesion
by: Pei, Yixuan
Published: (2022) -
Debiasing visual question and answering with answer preference
by: Zhang, Xinye
Published: (2020) -
Visual questioning and answering
by: Ong, Zavier Jian Le
Published: (2024)