ViGAT: Bottom-Up Event Recognition and Explanation in Video Using Factorized Graph Attention Network

In this paper a pure-attention bottom-up approach, called ViGAT, that utilizes an object detector together with a Vision Transformer (ViT) backbone network to derive object and frame features, and a head network to process these features for the task of event recognition and explanation in video, is...

Full description

Bibliographic Details
Main Authors:	Nikolaos Gkalelis, Dimitrios Daskalakis, Vasileios Mezaris
Format:	Article
Language:	English
Published:	IEEE 2022-01-01
Series:	IEEE Access
Subjects:	Video event recognition eXplainable AI (XAI) graph attention network factorized attention bottom-up
Online Access:	https://ieeexplore.ieee.org/document/9915576/

Internet

https://ieeexplore.ieee.org/document/9915576/

ViGAT: Bottom-Up Event Recognition and Explanation in Video Using Factorized Graph Attention Network

Internet

Similar Items