TIM: a time interval machine for audio-visual action recognition
<p>Diverse actions give rise to rich audio-visual signals in long videos. Recent works showcase that the two modalities of audio and video exhibit different temporal extents of events and distinct labels. We address the interplay between the two modalities in long videos by explicitly modellin...
Main Authors: | Chalk, J, Huh, J, Kazakos, E, Zisserman, A, Damen, D |
---|---|
Format: | Conference item |
Language: | English |
Published: |
IEEE
2024
|
Similar Items
-
EPIC-fusion: audio-visual temporal binding for egocentric action recognition
by: Kazakos, E, et al.
Published: (2020) -
Epic-sounds: a large-scale dataset of actions that sound
by: Huh, J, et al.
Published: (2023) -
Slow-fast auditory streams for audio recognition
by: Kazakos, E, et al.
Published: (2021) -
With a little help from my temporal context: multimodal egocentric action recognition
by: Kazakos, E, et al.
Published: (2021) -
Deep audio-visual speech recognition
by: Afouras, T, et al.
Published: (2018)