Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Generative multimodal content is increasingly prevalent in much of the content creation arena, as it has the potential to allow artists and media personnel to create pre-production mockups by quickly bringing their ideas to life. The generation of audio from text prompts is an important aspect of su...

Full description

Bibliographic Details
Main Authors: Majumder, Navonil, Hung, Chia-Yu, Ghosal, Deepanway, Hsu, Wei-Ning, Mihalcea, Rada, Poria, Soujanya
Format: Article
Language:English
Published: ACM|Proceedings of the 32nd ACM International Conference on Multimedia 2024
Online Access:https://hdl.handle.net/1721.1/157614