Debiasing visual question and answering with answer preference

Visual Question Answering (VQA) requires models to generate a reasonable answer with given an image and corresponding question. It requires strong reasoning capabilities for two kinds of input features, namely image and question. However, most state-of-the-art results heavily rely on superficial cor...

Full description

Bibliographic Details
Main Author: Zhang, Xinye
Other Authors: Zhang Hanwang
Format: Final Year Project (FYP)
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/137906
_version_ 1824453687645306880
author Zhang, Xinye
author2 Zhang Hanwang
author_facet Zhang Hanwang
Zhang, Xinye
author_sort Zhang, Xinye
collection NTU
description Visual Question Answering (VQA) requires models to generate a reasonable answer with given an image and corresponding question. It requires strong reasoning capabilities for two kinds of input features, namely image and question. However, most state-of-the-art results heavily rely on superficial correlations in the dataset, given it delicately balancing the dataset is almost impossible. In this paper, we proposed a simple method by using answer preference to reduce the impact of data bias and improve the robustness of VQA models against prior changes. Two pipelines of using answer preference, at the training stage as well as the inference stage, are experimented and achieved genuine improvement on the VQA-CP dataset. VQA-CP dataset is designed to test the performance of the VQA model under domain shift.
first_indexed 2025-02-19T03:10:23Z
format Final Year Project (FYP)
id ntu-10356/137906
institution Nanyang Technological University
language English
last_indexed 2025-02-19T03:10:23Z
publishDate 2020
publisher Nanyang Technological University
record_format dspace
spelling ntu-10356/1379062020-04-18T00:20:09Z Debiasing visual question and answering with answer preference Zhang, Xinye Zhang Hanwang School of Computer Science and Engineering hanwangzhang@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Visual Question Answering (VQA) requires models to generate a reasonable answer with given an image and corresponding question. It requires strong reasoning capabilities for two kinds of input features, namely image and question. However, most state-of-the-art results heavily rely on superficial correlations in the dataset, given it delicately balancing the dataset is almost impossible. In this paper, we proposed a simple method by using answer preference to reduce the impact of data bias and improve the robustness of VQA models against prior changes. Two pipelines of using answer preference, at the training stage as well as the inference stage, are experimented and achieved genuine improvement on the VQA-CP dataset. VQA-CP dataset is designed to test the performance of the VQA model under domain shift. Bachelor of Engineering (Computer Science) 2020-04-18T00:20:09Z 2020-04-18T00:20:09Z 2020 Final Year Project (FYP) https://hdl.handle.net/10356/137906 en SCSE19-0193 application/pdf Nanyang Technological University
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Zhang, Xinye
Debiasing visual question and answering with answer preference
title Debiasing visual question and answering with answer preference
title_full Debiasing visual question and answering with answer preference
title_fullStr Debiasing visual question and answering with answer preference
title_full_unstemmed Debiasing visual question and answering with answer preference
title_short Debiasing visual question and answering with answer preference
title_sort debiasing visual question and answering with answer preference
topic Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
url https://hdl.handle.net/10356/137906
work_keys_str_mv AT zhangxinye debiasingvisualquestionandansweringwithanswerpreference