Visual grounding in video for unsupervised word translation
There are thousands of actively spoken languages on Earth, but a single visual world. Grounding in this visual world has the potential to bridge the gap between all these languages. Our goal is to use visual grounding to improve unsupervised word mapping between languages. The key idea is to establi...
Main Authors: | Sigurdsson, GA, Alayrac, JB, Nematzadeh, A, Smaira, L, Malinowski, M, Carreira, J, Blunsom, P, Zisserman, A |
---|---|
Format: | Journal article |
Language: | English |
Published: |
IEEE
2020
|
Similar Items
-
The visual centrifuge: Model-free layered video representations
by: Alayrac, J-B, et al.
Published: (2020) -
End-to-end learning of visual representations from uncurated instructional videos
by: Miech, A, et al.
Published: (2020) -
Controllable attention for structured layered video decomposition
by: Alayrac, J-B, et al.
Published: (2020) -
Unsupervised word translation with adversarial autoencoder
by: Mohiuddin, Tasnim, et al.
Published: (2021) -
Unsupervised Word Translation with Adversarial Autoencoder
by: Mohiuddin, Tasnim, et al.
Published: (2020-06-01)