TIM: a time interval machine for audio-visual action recognition

<p>Diverse actions give rise to rich audio-visual signals in long videos. Recent works showcase that the two modalities of audio and video exhibit different temporal extents of events and distinct labels. We address the interplay between the two modalities in long videos by explicitly modellin...

Full description

Bibliographic Details
Main Authors: Chalk, J, Huh, J, Kazakos, E, Zisserman, A, Damen, D
Format: Conference item
Language:English
Published: IEEE 2024