TASTA: Text‐Assisted Spatial and Temporal Attention Network for Video Question Answering

Video question answering (VideoQA) is a typical task that integrates language and vision. The key for VideoQA is to extract relevant and effective visual information for answering a specific question. Information selection is believed to be necessary for this task due to the large amount of irreleva...

Full description

Bibliographic Details
Main Authors: Tian Wang, Boyao Hou, Jiakun Li, Peng Shi, Baochang Zhang, Hichem Snoussi
Format: Article
Language:English
Published: Wiley 2023-04-01
Series:Advanced Intelligent Systems
Subjects:
Online Access:https://doi.org/10.1002/aisy.202200131