Learning multimodal VAEs through mutual supervision
Multimodal VAEs seek to model the joint distribution over heterogeneous data (e.g.\ vision, language), whilst also capturing a shared representation across such modalities. Prior work has typically combined information from the modalities by reconciling idiosyncratic representations directly in the...
Main Authors: | , , , , , |
---|---|
Format: | Conference item |
Language: | English |
Published: |
OpenReview
2022
|