Interpreting vision and language generative models with semantic visual priors

When applied to Image-to-text models, explainability methods have two challenges. First, they often provide token-by-token explanations namely, they compute a visual explanation for each token of the generated sequence. This makes explanations expensive to compute and unable to comprehensively expla...

Full description

Bibliographic Details
Main Authors:	Michele Cafagna, Lina M. Rojas-Barahona, Kees van Deemter, Albert Gatt
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2023-09-01
Series:	Frontiers in Artificial Intelligence
Subjects:	vision and language multimodality explainability image captioning visual question answering natural language generation
Online Access:	https://www.frontiersin.org/articles/10.3389/frai.2023.1220476/full

Internet

https://www.frontiersin.org/articles/10.3389/frai.2023.1220476/full

Interpreting vision and language generative models with semantic visual priors

Internet

Similar Items