Text-Free Audio Captions of Short Videos from Latent Space Representation

In this thesis, we re-implement previous work exploring image to speech captioning. We expand upon the work to implement video to speech captioning. Specifically, we implement a text-free image to speech captioning pipeline that integrates four distinct machine learning models. We alter the models t...

Full description

Bibliographic Details
Main Author: Agarwal, Anisha
Other Authors: Oliva, Aude
Format: Thesis
Published: Massachusetts Institute of Technology 2022
Online Access:https://hdl.handle.net/1721.1/144873