TASTA: Text‐Assisted Spatial and Temporal Attention Network for Video Question Answering

Video question answering (VideoQA) is a typical task that integrates language and vision. The key for VideoQA is to extract relevant and effective visual information for answering a specific question. Information selection is believed to be necessary for this task due to the large amount of irreleva...

Full description

Bibliographic Details
Main Authors:	Tian Wang, Boyao Hou, Jiakun Li, Peng Shi, Baochang Zhang, Hichem Snoussi
Format:	Article
Language:	English
Published:	Wiley 2023-04-01
Series:	Advanced Intelligent Systems
Subjects:	attention mechanism video question answering visual question answering
Online Access:	https://doi.org/10.1002/aisy.202200131

Internet

https://doi.org/10.1002/aisy.202200131

TASTA: Text‐Assisted Spatial and Temporal Attention Network for Video Question Answering

Internet

Similar Items