Text-Free Audio Captions of Short Videos from Latent Space Representation
In this thesis, we re-implement previous work exploring image to speech captioning. We expand upon the work to implement video to speech captioning. Specifically, we implement a text-free image to speech captioning pipeline that integrates four distinct machine learning models. We alter the models t...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2022
|
Online Access: | https://hdl.handle.net/1721.1/144873 |