Spoken ObjectNet: Creating a Bias-Controlled Spoken Caption Dataset

Visually-grounded spoken language datasets can enable models to learn cross-modal correspondences with very weak supervision. However, modern audio-visual datasets contain biases that undermine the real-world performance of models trained on that data. We introduce Spoken ObjectNet, which is designe...

Full description

Bibliographic Details
Main Author:	Palmer, Ian A.
Other Authors:	Glass, James R.
Format:	Thesis
Published:	Massachusetts Institute of Technology 2022
Online Access:	https://hdl.handle.net/1721.1/139030

Internet

https://hdl.handle.net/1721.1/139030

Spoken ObjectNet: Creating a Bias-Controlled Spoken Caption Dataset

Internet

Similar Items