Video understanding using multimodal deep learning

<p>Our experience of the world is multimodal, however deep learning networks have been traditionally designed for and trained on unimodal inputs such as images, audio segments or text. In this thesis we develop strategies to exploit multimodal information (in the form of vision, text, speech a...

詳細記述

書誌詳細
第一著者: Nagrani, A
その他の著者: Zisserman, A
フォーマット: 学位論文
言語:English
出版事項: 2020
主題:

類似資料