WLiT: Windows and Linear Transformer for Video Action Recognition
The emergence of Transformer has led to the rapid development of video understanding, but it also brings the problem of high computational complexity. Previously, there were methods to divide the feature maps into windows along the spatiotemporal dimensions and then calculate the attention. There ar...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-02-01
|
Series: | Sensors |
Subjects: | |
Online Access: | https://www.mdpi.com/1424-8220/23/3/1616 |