Video action transformer network

We introduce the Action Transformer model for recognizing and localizing human actions in video clips. We repurpose a Transformer-style architecture to aggregate features from the spatiotemporal context around the person whose actions we are trying to classify. We show that by using high-resolution,...

Full description

Bibliographic Details
Main Authors: Girdhar, R, Carreira, J, Doersch, C, Zisserman, A
Format: Conference item
Language:English
Published: IEEE 2020