End-to-end learning of visual representations from uncurated instructional videos

End-to-end learning of visual representations from uncurated instructional videos

Annotating videos is cumbersome, expensive and not scalable. Yet, many strong video models still rely on manually annotated data. With the recent introduction of the HowTo100M dataset, narrated videos now offer the possibility of learning video representations without manual supervision. In this wor...

Cur síos iomlán

Sonraí bibleagrafaíochta
Príomhchruthaitheoirí:	Miech, A, Alayrac, J-B, Smaira, L, Laptev, I, Sivic, J, Zisserman, A
Formáid:	Conference item
Teanga:	English
Foilsithe / Cruthaithe:	IEEE 2020

Míreanna comhchosúla

The visual centrifuge: Model-free layered video representations
de réir: Alayrac, J-B, et al.
Foilsithe / Cruthaithe: (2020)

Visual grounding in video for unsupervised word translation
de réir: Sigurdsson, GA, et al.
Foilsithe / Cruthaithe: (2020)

Semi-supervised learning of facial attributes in video
de réir: Cherniavsky, N, et al.
Foilsithe / Cruthaithe: (2013)

Video Google: efficient visual search of videos
de réir: Sivic, J, et al.
Foilsithe / Cruthaithe: (2007)

Deep learning for automated visual inspection of uncured rubber
de réir: Smith, James Thomas Howard
Foilsithe / Cruthaithe: (2018)

Efficient visual search for objects in videos
de réir: Sivic, J, et al.
Foilsithe / Cruthaithe: (2008)

Efficient visual content retrieval and mining in videos
de réir: Sivic, J, et al.
Foilsithe / Cruthaithe: (2004)

Efficient visual search of videos cast as text retrieval
de réir: Sivic, J, et al.
Foilsithe / Cruthaithe: (2008)

End-to-end learning, and audio-visual human-centric video understanding
de réir: Brown, A
Foilsithe / Cruthaithe: (2022)

Robust Learning from Uncurated Data
de réir: Chuang, Ching-Yao
Foilsithe / Cruthaithe: (2023)

End to End Alignment Learning of Instructional Videos with Spatiotemporal Hybrid Encoding and Decoding Space Reduction
de réir: Lin Wang, et al.
Foilsithe / Cruthaithe: (2021-05-01)

Frozen in time: A joint video and image encoder for end-to-end retrieval
de réir: Bain, M, et al.
Foilsithe / Cruthaithe: (2022)

Sight to Sound: An End-to-End Approach for Visual Piano Transcription
de réir: Koepke, S, et al.
Foilsithe / Cruthaithe: (2020)

Video Google: a text retrieval approach to object matching in videos
de réir: Sivic, J, et al.
Foilsithe / Cruthaithe: (2003)

End-to-end representation learning for Correlation Filter based tracking
de réir: Valmadre, J, et al.
Foilsithe / Cruthaithe: (2017)

“Who are you?” - Learning person specific classifiers from video
de réir: Sivic, J, et al.
Foilsithe / Cruthaithe: (2009)

Cancer immunotherapies: A hope for the uncurable?
de réir: Firas Hamdan, et al.
Foilsithe / Cruthaithe: (2023-02-01)

Controllable attention for structured layered video decomposition
de réir: Alayrac, J-B, et al.
Foilsithe / Cruthaithe: (2020)

Video data mining using configurations of viewpoint invariant regions
de réir: Sivic, J, et al.
Foilsithe / Cruthaithe: (2004)

Object level grouping for video shots
de réir: Sivic, J, et al.
Foilsithe / Cruthaithe: (2006)

Object level grouping for video shots
de réir: Sivic, J, et al.
Foilsithe / Cruthaithe: (2004)

Video representation learning by dense predictive coding
de réir: Han, T, et al.
Foilsithe / Cruthaithe: (2019)

Modeling of Shear Rheological Behavior of Uncured Rubber Melt
de réir: Yang Hengxiao, et al.
Foilsithe / Cruthaithe: (2020-12-01)

Person spotting: video shot retrieval for face sets
de réir: Sivic, J, et al.
Foilsithe / Cruthaithe: (2007)

An End-to-End Multiplex Graph Neural Network for Graph Representation Learning
de réir: Yanyan Liang, et al.
Foilsithe / Cruthaithe: (2021-01-01)

Spatial and temporal learning representation for end-to-end recording device identification
de réir: Chunyan Zeng, et al.
Foilsithe / Cruthaithe: (2021-07-01)

Influencer Loss: End-to-end Geometric Representation Learning for Track Reconstruction
de réir: Murnane Daniel
Foilsithe / Cruthaithe: (2024-01-01)

Taking the bite out of automated naming of characters in TV video
de réir: Everingham, M, et al.
Foilsithe / Cruthaithe: (2008)

Quantifying inter- and intra-ply shear in the deformation of uncured composite laminates
de réir: S. Erland, et al.
Foilsithe / Cruthaithe: (2021-04-01)

Self-supervised co-training for video representation learning
de réir: Han, T, et al.
Foilsithe / Cruthaithe: (2020)

End-to-end hypoglossal-facial nerve anastomosis - surgical video
de réir: A. Ferreira, et al.
Foilsithe / Cruthaithe: (2021-01-01)

END-TO-END DEPTH FROM MOTION WITH STABILIZED MONOCULAR VIDEOS
de réir: C. Pinard, et al.
Foilsithe / Cruthaithe: (2017-08-01)

Memory-augmented dense predictive coding for video representation learning
de réir: Han, T, et al.
Foilsithe / Cruthaithe: (2020)

End-to-end transport for video QoE fairness
de réir: Nathan, Vikram, et al.
Foilsithe / Cruthaithe: (2021)

End-to-end QOS management of compressed video streams
de réir: See, Shu Wei.
Foilsithe / Cruthaithe: (2008)

Development of an End-to-End Deep Learning Framework for Sign Language Recognition, Translation, and Video Generation
de réir: B. Natarajan, et al.
Foilsithe / Cruthaithe: (2022-01-01)

End-to-End Deep One-Class Learning for Anomaly Detection in UAV Video Stream
de réir: Slim Hamdi, et al.
Foilsithe / Cruthaithe: (2021-05-01)

An end-to-end implicit neural representation architecture for medical volume data.
de réir: Armin Sheibanifard, et al.
Foilsithe / Cruthaithe: (2025-01-01)

3D Object-Oriented Learning: An End-to-end Transformation-Disentangled 3D Representation
de réir: Liao, Qianli, et al.
Foilsithe / Cruthaithe: (2018)

Unsupervised Representation Learning with Task-Agnostic Feature Masking for Robust End-to-End Speech Recognition
de réir: June-Woo Kim, et al.
Foilsithe / Cruthaithe: (2023-01-01)