Visual Question Answering reasoning with external knowledge based on bimodal graph neural network

Visual Question Answering (VQA) with external knowledge requires external knowledge and visual content to answer questions about images. The defect of existing VQA solutions is that they need to identify task-related information in the obtained pictures, questions, and knowledge graphs. It is necess...

Full description

Bibliographic Details
Main Authors:	Zhenyu Yang, Lei Wu, Peian Wen, Peng Chen
Format:	Article
Language:	English
Published:	AIMS Press 2023-02-01
Series:	Electronic Research Archive
Subjects:	visual question answering external knowledge bimodal fusion pre-trained language models knowledge graphs
Online Access:	https://www.aimspress.com/article/doi/10.3934/era.2023100?viewType=HTML

_version_	1797833198143537152
author	Zhenyu Yang Lei Wu Peian Wen Peng Chen
author_facet	Zhenyu Yang Lei Wu Peian Wen Peng Chen
author_sort	Zhenyu Yang
collection	DOAJ
description	Visual Question Answering (VQA) with external knowledge requires external knowledge and visual content to answer questions about images. The defect of existing VQA solutions is that they need to identify task-related information in the obtained pictures, questions, and knowledge graphs. It is necessary to properly fuse and embed the information between different modes identified, to reduce the noise and difficulty in cross-modality reasoning of VQA models. However, this process of rationally integrating information between different modes and joint reasoning to find relevant evidence to correctly predict the answer to the question still deserves further study. This paper proposes a bimodal Graph Neural Network model combining pre-trained Language Models and Knowledge Graphs (BIGNN-LM-KG). Researchers built the concepts graph by the images and questions concepts separately. In constructing the concept graph, we used the combined reasoning advantages of LM+KG. Specifically, use KG to jointly infer the images and question entity concepts to build a concept graph. Use LM to calculate the correlation score to screen the nodes and paths of the concept graph. Then, we form a visual graph from the visual and spatial features of the filtered image entities. We use the improved GNN to learn the representation of the two graphs and to predict the most likely answer by fusing the information of two different modality graphs using a modality fusion GNN. On the common dataset of VQA, the model we proposed obtains good experiment results. It also verifies the validity of each component in the model and the interpretability of the model.
first_indexed	2024-04-09T14:19:36Z
format	Article
id	doaj.art-78a0bf51799b43a5aee066699958a729
institution	Directory Open Access Journal
issn	2688-1594
language	English
last_indexed	2024-04-09T14:19:36Z
publishDate	2023-02-01
publisher	AIMS Press
record_format	Article
series	Electronic Research Archive
spelling	doaj.art-78a0bf51799b43a5aee066699958a7292023-05-05T01:21:37ZengAIMS PressElectronic Research Archive2688-15942023-02-013141948196510.3934/era.2023100Visual Question Answering reasoning with external knowledge based on bimodal graph neural networkZhenyu Yang 0Lei Wu1Peian Wen2Peng Chen31. Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 314099, China3. School of Computer and Software Engineering, Xihua University, Chengdu 610039, China2. School of Mathematical Sciences, University of Electronic Science and Technology of China, Chengdu 611731, China3. School of Computer and Software Engineering, Xihua University, Chengdu 610039, China3. School of Computer and Software Engineering, Xihua University, Chengdu 610039, ChinaVisual Question Answering (VQA) with external knowledge requires external knowledge and visual content to answer questions about images. The defect of existing VQA solutions is that they need to identify task-related information in the obtained pictures, questions, and knowledge graphs. It is necessary to properly fuse and embed the information between different modes identified, to reduce the noise and difficulty in cross-modality reasoning of VQA models. However, this process of rationally integrating information between different modes and joint reasoning to find relevant evidence to correctly predict the answer to the question still deserves further study. This paper proposes a bimodal Graph Neural Network model combining pre-trained Language Models and Knowledge Graphs (BIGNN-LM-KG). Researchers built the concepts graph by the images and questions concepts separately. In constructing the concept graph, we used the combined reasoning advantages of LM+KG. Specifically, use KG to jointly infer the images and question entity concepts to build a concept graph. Use LM to calculate the correlation score to screen the nodes and paths of the concept graph. Then, we form a visual graph from the visual and spatial features of the filtered image entities. We use the improved GNN to learn the representation of the two graphs and to predict the most likely answer by fusing the information of two different modality graphs using a modality fusion GNN. On the common dataset of VQA, the model we proposed obtains good experiment results. It also verifies the validity of each component in the model and the interpretability of the model.https://www.aimspress.com/article/doi/10.3934/era.2023100?viewType=HTMLvisual question answeringexternal knowledgebimodal fusionpre-trained language modelsknowledge graphs
spellingShingle	Zhenyu Yang Lei Wu Peian Wen Peng Chen Visual Question Answering reasoning with external knowledge based on bimodal graph neural network Electronic Research Archive visual question answering external knowledge bimodal fusion pre-trained language models knowledge graphs
title	Visual Question Answering reasoning with external knowledge based on bimodal graph neural network
title_full	Visual Question Answering reasoning with external knowledge based on bimodal graph neural network
title_fullStr	Visual Question Answering reasoning with external knowledge based on bimodal graph neural network
title_full_unstemmed	Visual Question Answering reasoning with external knowledge based on bimodal graph neural network
title_short	Visual Question Answering reasoning with external knowledge based on bimodal graph neural network
title_sort	visual question answering reasoning with external knowledge based on bimodal graph neural network
topic	visual question answering external knowledge bimodal fusion pre-trained language models knowledge graphs
url	https://www.aimspress.com/article/doi/10.3934/era.2023100?viewType=HTML
work_keys_str_mv	AT zhenyuyang visualquestionansweringreasoningwithexternalknowledgebasedonbimodalgraphneuralnetwork AT leiwu visualquestionansweringreasoningwithexternalknowledgebasedonbimodalgraphneuralnetwork AT peianwen visualquestionansweringreasoningwithexternalknowledgebasedonbimodalgraphneuralnetwork AT pengchen visualquestionansweringreasoningwithexternalknowledgebasedonbimodalgraphneuralnetwork

Visual Question Answering reasoning with external knowledge based on bimodal graph neural network

Similar Items