Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset

Visually-grounded spoken language datasets can enable models to learn cross-modal correspon- dences with very weak supervision. However, modern audio-visual datasets contain biases that un- dermine the real-world performance of models trained on that data. We introduce Spoken ObjectNet, which is des...

Full description

Bibliographic Details
Main Authors:	Palmer, Ian, Rouditchenko, Andrew, Barbu, Andrei, Katz, Boris, Glass, James
Format:	Article
Published:	Center for Brains, Minds and Machines (CBMM), The 22nd Annual Conference of the International Speech Communication Association (Interspeech) 2022
Online Access:	https://hdl.handle.net/1721.1/141358

Internet

https://hdl.handle.net/1721.1/141358

Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset

Internet

Similar Items