Video understanding using multimodal deep learning
<p>Our experience of the world is multimodal, however deep learning networks have been traditionally designed for and trained on unimodal inputs such as images, audio segments or text. In this thesis we develop strategies to exploit multimodal information (in the form of vision, text, speech a...
第一著者: | Nagrani, A |
---|---|
その他の著者: | Zisserman, A |
フォーマット: | 学位論文 |
言語: | English |
出版事項: |
2020
|
主題: |
類似資料
-
Sign language understanding using multimodal learning
著者:: Momeni, L
出版事項: (2024) -
Understanding Multimodal Popularity Prediction of Social Media Videos With Self-Attention
著者:: Adam Bielski, 等
出版事項: (2018-01-01) -
End-to-end learning, and audio-visual human-centric video understanding
著者:: Brown, A
出版事項: (2022) -
Holistic image understanding with deep learning and dense random fields
著者:: Zheng, S
出版事項: (2016) -
Learning with multimodal self-supervision
著者:: Chen, H
出版事項: (2021)