Learning Audio-Video Language Representations

Automatic speech recognition has seen recent advancements powered by machine learning, but it is still only available for a small fraction of the more than 7,000 languages spoken worldwide due to the reliance on manually annotated speech data. Unlabeled multi-modal data, such as videos, are now incr...

Full description

Bibliographic Details
Main Author: Rouditchenko, Andrew
Other Authors: Glass, James
Format: Thesis
Published: Massachusetts Institute of Technology 2022
Online Access:https://hdl.handle.net/1721.1/139024