SoundNet: learning sound representations from unlabeled video
We learn rich natural sound representations by capitalizing on large amounts of unlabeled sound data collected in the wild. We leverage the natural synchronization between vision and sound to learn an acoustic representation using two-million unlabeled videos. Unlabeled video has the advantage that...
Main Authors: | Aytar, Yusuf, Vondrick*, Carl, Torralba, Antonio |
---|---|
Other Authors: | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory |
Format: | Article |
Language: | English |
Published: |
2020
|
Online Access: | https://hdl.handle.net/1721.1/124993 |
Similar Items
-
Anticipating Visual Representations from Unlabeled Video
by: Vondrick, Carl, et al.
Published: (2018) -
Learning Aligned Cross-Modal Representations from Weakly Aligned Data
by: Castrejon, Lluis, et al.
Published: (2017) -
The Sound of Pixels
by: Zhao, Hang, et al.
Published: (2020) -
Cross-Modal Scene Networks
by: Aytar, Yusuf, et al.
Published: (2021) -
Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning
by: Owens, Andrew, et al.
Published: (2021)