Visual grounding in video for unsupervised word translation

Visual grounding in video for unsupervised word translation

There are thousands of actively spoken languages on Earth, but a single visual world. Grounding in this visual world has the potential to bridge the gap between all these languages. Our goal is to use visual grounding to improve unsupervised word mapping between languages. The key idea is to establi...

Full description

Bibliographic Details
Main Authors:	Sigurdsson, GA, Alayrac, JB, Nematzadeh, A, Smaira, L, Malinowski, M, Carreira, J, Blunsom, P, Zisserman, A
Format:	Journal article
Language:	English
Published:	IEEE 2020

Similar Items

The visual centrifuge: Model-free layered video representations
by: Alayrac, J-B, et al.
Published: (2020)

End-to-end learning of visual representations from uncurated instructional videos
by: Miech, A, et al.
Published: (2020)

Controllable attention for structured layered video decomposition
by: Alayrac, J-B, et al.
Published: (2020)

Unsupervised word translation with adversarial autoencoder
by: Mohiuddin, Tasnim, et al.
Published: (2021)

Unsupervised Word Translation with Adversarial Autoencoder
by: Mohiuddin, Tasnim, et al.
Published: (2020-06-01)

Learning to lip read words by watching videos
by: Chung, J, et al.
Published: (2018)

Fully Unsupervised Machine Translation Using Context-Aware Word Translation and Denoising Autoencoder
by: Shweta Chauhan, et al.
Published: (2022-12-01)

Unsupervised discovery of visual object class hierarchies
by: Sivic, J, et al.
Published: (2008)

Video Google: efficient visual search of videos
by: Sivic, J, et al.
Published: (2007)

Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing
by: Blunsom, P, et al.
Published: (2010)

Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing.
by: Blunsom, P, et al.
Published: (2010)

Modelling the Lexicon in Unsupervised Part of Speech Induction
by: Dubbin, G, et al.
Published: (2015)

Unsupervised part of speech inference with particle filters
by: Dubbin, G, et al.
Published: (2012)

Unsupervised Bayesian Part of Speech Inference with Particle Gibbs
by: Dubbin, G, et al.
Published: (2015)

An Unsupervised Ranking Model for Noun−Noun Compositionality
by: Hermann, K, et al.
Published: (2015)

Efficient visual search for objects in videos
by: Sivic, J, et al.
Published: (2008)

A Hierarchical Pitman−Yor Process HMM for Unsupervised Part of Speech Induction
by: Blunsom, P, et al.
Published: (2011)

A Hierarchical Pitman-Yor Process HMM for Unsupervised Part of Speech Induction.
by: Blunsom, P, et al.
Published: (2011)

Sub-word level lip reading with visual attention
by: Prajwal, KR, et al.
Published: (2022)

Unsupervised Word Sense Disambiguation Using Word Embeddings
by: Behzad Moradi, et al.
Published: (2019-11-01)

Efficient visual content retrieval and mining in videos
by: Sivic, J, et al.
Published: (2004)

Unsupervised learning of clutter-resistant visual representations from natural videos
by: Liao, Qianli, et al.
Published: (2015)

Automatic video genre classification with visual words
by: Vu, Minh Khue
Published: (2011)

Seeing wake words: Audio-visual keyword spotting
by: Momeni, L, et al.
Published: (2020)

Video action transformer network
by: Girdhar, R, et al.
Published: (2020)

Efficient visual search of videos cast as text retrieval
by: Sivic, J, et al.
Published: (2008)

Discriminative Word Alignment with Conditional Random Fields
by: Blunsom, P, et al.
Published: (2006)

Unsupervised Acquisition of Predominant Word Senses
by: Diana McCarthy, et al.
Published: (2021-03-01)

Massively parallel video networks
by: Carreira, J, et al.
Published: (2018)

Compositional Morphology for Word Representations and Language Modelling
by: Botha, J, et al.
Published: (2014)

Multilingual Distributed Representations without Word Alignment
by: Hermann, K, et al.
Published: (2014)

Multilingual distributed representations without word alignment
by: Hermann, K, et al.
Published: (2014)

Visually grounded virtual accelerometers : a longitudinal video investigation of dyadic bodily dynamics around the time of word acquisition
by: Tsourides, Kleovoulos (Kleovoulos Leo)
Published: (2011)

WikiCREM: A large unsupervised corpus for coreference resolution
by: Kocijan, V, et al.
Published: (2019)

A New Unsupervised Approach to Word Segmentation
by: Hanshi Wang, et al.
Published: (2021-03-01)

Unsupervised Clinical Language Translation
by: Weng, Wei-Hung, et al.
Published: (2021)

Object class recognition by unsupervised scale-invariant learning
by: Fergus, R, et al.
Published: (2003)

Learning Bilingual Word Representations by Marginalizing Alignments
by: Kočiský, T, et al.
Published: (2014)

Meta-learning deep visual words for fast video object segmentation
by: Behl, HS, et al.
Published: (2019)

Self-supervised learning of audio-visual objects from video
by: Afouras, T, et al.
Published: (2020)