Video understanding using multimodal deep learning

<p>Our experience of the world is multimodal, however deep learning networks have been traditionally designed for and trained on unimodal inputs such as images, audio segments or text. In this thesis we develop strategies to exploit multimodal information (in the form of vision, text, speech a...

全面介紹

書目詳細資料
主要作者: Nagrani, A
其他作者: Zisserman, A
格式: Thesis
語言:English
出版: 2020
主題:

相似書籍