Self-supervised learning of audio-visual objects from video

Our objective is to transform a video into a set of discrete audio-visual objects using self-supervised learning. To this end, we introduce a model that uses attention to localize and group sound sources, and optical flow to aggregate information over time. We demonstrate the effectiveness of the au...

पूर्ण विवरण

ग्रंथसूची विवरण
मुख्य लेखकों: Afouras, T, Owens, A, Chung, JS, Zisserman, A
स्वरूप: Conference item
भाषा:English
प्रकाशित: Springer 2020