Video action transformer network
We introduce the Action Transformer model for recognizing and localizing human actions in video clips. We repurpose a Transformer-style architecture to aggregate features from the spatiotemporal context around the person whose actions we are trying to classify. We show that by using high-resolution,...
Main Authors: | Girdhar, R, Carreira, J, Doersch, C, Zisserman, A |
---|---|
Format: | Conference item |
Language: | English |
Published: |
IEEE
2020
|
Similar Items
-
Massively parallel video networks
by: Carreira, J, et al.
Published: (2018) -
Two-stream convolutional networks for action recognition in videos
by: Simonyan, K, et al.
Published: (2014) -
Convolutional two-stream network fusion for video action recognition
by: Feichtenhofer, C, et al.
Published: (2016) -
Input-level inductive biases for 3D reconstruction
by: Yifan, W, et al.
Published: (2022) -
Learning from one continuous video stream
by: Carreira, J, et al.
Published: (2024)