Video understanding using multimodal deep learning
<p>Our experience of the world is multimodal, however deep learning networks have been traditionally designed for and trained on unimodal inputs such as images, audio segments or text. In this thesis we develop strategies to exploit multimodal information (in the form of vision, text, speech a...
Yazar: | Nagrani, A |
---|---|
Diğer Yazarlar: | Zisserman, A |
Materyal Türü: | Tez |
Dil: | English |
Baskı/Yayın Bilgisi: |
2020
|
Konular: |
Benzer Materyaller
-
Sign language understanding using multimodal learning
Yazar:: Momeni, L
Baskı/Yayın Bilgisi: (2024) -
Understanding Multimodal Popularity Prediction of Social Media Videos With Self-Attention
Yazar:: Adam Bielski, ve diğerleri
Baskı/Yayın Bilgisi: (2018-01-01) -
End-to-end learning, and audio-visual human-centric video understanding
Yazar:: Brown, A
Baskı/Yayın Bilgisi: (2022) -
Holistic image understanding with deep learning and dense random fields
Yazar:: Zheng, S
Baskı/Yayın Bilgisi: (2016) -
Learning with multimodal self-supervision
Yazar:: Chen, H
Baskı/Yayın Bilgisi: (2021)