Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset
Visually-grounded spoken language datasets can enable models to learn cross-modal correspon- dences with very weak supervision. However, modern audio-visual datasets contain biases that un- dermine the real-world performance of models trained on that data. We introduce Spoken ObjectNet, which is des...
Main Authors: | Palmer, Ian, Rouditchenko, Andrew, Barbu, Andrei, Katz, Boris, Glass, James |
---|---|
Format: | Article |
Published: |
Center for Brains, Minds and Machines (CBMM), The 22nd Annual Conference of the International Speech Communication Association (Interspeech)
2022
|
Online Access: | https://hdl.handle.net/1721.1/141358 |
Similar Items
-
Spoken ObjectNet: Creating a Bias-Controlled Spoken Caption Dataset
by: Palmer, Ian A.
Published: (2022) -
ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models
by: Barbu, A, et al.
Published: (2021) -
Unsupervised learning of spoken language with visual context
by: Harwath, David, et al.
Published: (2020) -
Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input
by: Harwath, David, et al.
Published: (2021) -
Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input
by: Harwath, David F., et al.
Published: (2020)