Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Generative multimodal content is increasingly prevalent in much of the content creation arena, as it has the potential to allow artists and media personnel to create pre-production mockups by quickly bringing their ideas to life. The generation of audio from text prompts is an important aspect of su...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
ACM|Proceedings of the 32nd ACM International Conference on Multimedia
2024
|
Online Access: | https://hdl.handle.net/1721.1/157614 |