TASTA: Text‐Assisted Spatial and Temporal Attention Network for Video Question Answering
Video question answering (VideoQA) is a typical task that integrates language and vision. The key for VideoQA is to extract relevant and effective visual information for answering a specific question. Information selection is believed to be necessary for this task due to the large amount of irreleva...
Main Authors: | Tian Wang, Boyao Hou, Jiakun Li, Peng Shi, Baochang Zhang, Hichem Snoussi |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2023-04-01
|
Series: | Advanced Intelligent Systems |
Subjects: | |
Online Access: | https://doi.org/10.1002/aisy.202200131 |
Similar Items
-
Standard refrigeration and air conditioning : questions and answers/
by: 247465 Elonka, Stephen Michael, et al.
Published: (1973) -
Arabic Question Answering Systems: Gap Analysis
by: Mariam M. Biltawi, et al.
Published: (2021-01-01) -
Stumpers!: answers to hundreds of questions that stumped the experts /
by: Shapiro, Fred R.
Published: (1998) -
Multi-Shared Attention with Global and Local Pathways for Video Question Answering
by: WANG Lei-quan, HOU Wen-yan, YUAN Shao-zu, ZHAO Xin, LIN Yao, WU Chun-lei
Published: (2021-08-01) -
Co-Attention Network With Question Type for Visual Question Answering
by: Chao Yang, et al.
Published: (2019-01-01)