TASTA: Text‐Assisted Spatial and Temporal Attention Network for Video Question Answering
Video question answering (VideoQA) is a typical task that integrates language and vision. The key for VideoQA is to extract relevant and effective visual information for answering a specific question. Information selection is believed to be necessary for this task due to the large amount of irreleva...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2023-04-01
|
Series: | Advanced Intelligent Systems |
Subjects: | |
Online Access: | https://doi.org/10.1002/aisy.202200131 |