Object sequences: encoding categorical and spatial information for a yes/no visual question answering task
The task of visual question answering (VQA) has gained wide popularity in recent times. Effectively solving the VQA task requires the understanding of both the visual content in the image and the language information associated with the text‐based question. In this study, the authors propose a novel...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2018-12-01
|
Series: | IET Computer Vision |
Subjects: | |
Online Access: | https://doi.org/10.1049/iet-cvi.2018.5226 |