End-to-end learning of visual representations from uncurated instructional videos

Annotating videos is cumbersome, expensive and not scalable. Yet, many strong video models still rely on manually annotated data. With the recent introduction of the HowTo100M dataset, narrated videos now offer the possibility of learning video representations without manual supervision. In this wor...

Cur síos iomlán

Sonraí bibleagrafaíochta
Príomhchruthaitheoirí: Miech, A, Alayrac, J-B, Smaira, L, Laptev, I, Sivic, J, Zisserman, A
Formáid: Conference item
Teanga:English
Foilsithe / Cruthaithe: IEEE 2020

Míreanna comhchosúla