Learning Audio-Video Language Representations

Automatic speech recognition has seen recent advancements powered by machine learning, but it is still only available for a small fraction of the more than 7,000 languages spoken worldwide due to the reliance on manually annotated speech data. Unlabeled multi-modal data, such as videos, are now incr...

Full description

Bibliographic Details
Main Author:	Rouditchenko, Andrew
Other Authors:	Glass, James
Format:	Thesis
Published:	Massachusetts Institute of Technology 2022
Online Access:	https://hdl.handle.net/1721.1/139024

Internet

https://hdl.handle.net/1721.1/139024

Learning Audio-Video Language Representations

Internet

Similar Items