SoundNet: learning sound representations from unlabeled video

We learn rich natural sound representations by capitalizing on large amounts of unlabeled sound data collected in the wild. We leverage the natural synchronization between vision and sound to learn an acoustic representation using two-million unlabeled videos. Unlabeled video has the advantage that...

Full description

Bibliographic Details
Main Authors: Aytar, Yusuf, Vondrick*, Carl, Torralba, Antonio
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format: Article
Language:English
Published: 2020
Online Access:https://hdl.handle.net/1721.1/124993