Modeling Multimodal Uncertainties via Probability Distribution Encoders Included Vision-Language Models

In the field of multimodal understanding and generation, tackling inherent uncertainties is essential for mitigating ambiguous interpretations across multiple targets. We introduce the Probability Distribution Encoder (PDE), a versatile, plug-and-play module that utilizes sequence-level and feature-...

Full description

Bibliographic Details
Main Authors: Junjie Wang, Yatai Ji, Yuxiang Zhang, Yanru Zhu, Tetsuya Sakai
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10373835/