Illation of Video Visual Relation Detection Based on Graph Neural Network

Visual relation detection task is the bridge between semantic text and image information, and it can better express the content of images or video through the relation triple <subject, predicate, object>. This significant research can be applied to image question answering, vid...

Full description

Bibliographic Details
Main Authors: Mingcheng Qu, Jianxun Cui, Yuxi Nie, Tonghua Su
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9547267/
_version_ 1819145435367866368
author Mingcheng Qu
Jianxun Cui
Yuxi Nie
Tonghua Su
author_facet Mingcheng Qu
Jianxun Cui
Yuxi Nie
Tonghua Su
author_sort Mingcheng Qu
collection DOAJ
description Visual relation detection task is the bridge between semantic text and image information, and it can better express the content of images or video through the relation triple <subject, predicate, object>. This significant research can be applied to image question answering, video subtitles and other directions Using the video as input to the task of visual relationship detection receives less attention. Therefore, we propose an algorithm based on the graph convolution neural network and multi-hypothesis tree to implement video relationship prediction. Video visual relationship detection algorithm is divided into three steps: Firstly, the motion trajectories of the subject and object in the input video clip are generated; Secondly, a VRGE network module based on the graph convolution neural network is proposed to predict the relationship between objects in the video clip; Finally, the relationship triplets are formed through the multi-hypothesis fusion algorithm (MHF) and the visual relationship. We have verified our method on the benchmark ImageNet-VidVRD dataset. The experimental results demonstrate that our proposed method can achieve a satisfactory accuracy of 29.05% and recall of 10.18% for visual relation detection.
first_indexed 2024-12-22T12:57:59Z
format Article
id doaj.art-dd585be16d894eec836b8e7a7b9c2668
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-22T12:57:59Z
publishDate 2021-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-dd585be16d894eec836b8e7a7b9c26682022-12-21T18:25:04ZengIEEEIEEE Access2169-35362021-01-01914114414115310.1109/ACCESS.2021.31152609547267Illation of Video Visual Relation Detection Based on Graph Neural NetworkMingcheng Qu0Jianxun Cui1Yuxi Nie2https://orcid.org/0000-0001-6468-6898Tonghua Su3Department of Software, Harbin Institute of Technology, Harbin, ChinaDepartment of Software, Harbin Institute of Technology, Harbin, ChinaDepartment of Software, Harbin Institute of Technology, Harbin, ChinaDepartment of Software, Harbin Institute of Technology, Harbin, ChinaVisual relation detection task is the bridge between semantic text and image information, and it can better express the content of images or video through the relation triple <subject, predicate, object>. This significant research can be applied to image question answering, video subtitles and other directions Using the video as input to the task of visual relationship detection receives less attention. Therefore, we propose an algorithm based on the graph convolution neural network and multi-hypothesis tree to implement video relationship prediction. Video visual relationship detection algorithm is divided into three steps: Firstly, the motion trajectories of the subject and object in the input video clip are generated; Secondly, a VRGE network module based on the graph convolution neural network is proposed to predict the relationship between objects in the video clip; Finally, the relationship triplets are formed through the multi-hypothesis fusion algorithm (MHF) and the visual relationship. We have verified our method on the benchmark ImageNet-VidVRD dataset. The experimental results demonstrate that our proposed method can achieve a satisfactory accuracy of 29.05% and recall of 10.18% for visual relation detection.https://ieeexplore.ieee.org/document/9547267/Video visual relation detectiontarget detectiongraph convolutional neural network
spellingShingle Mingcheng Qu
Jianxun Cui
Yuxi Nie
Tonghua Su
Illation of Video Visual Relation Detection Based on Graph Neural Network
IEEE Access
Video visual relation detection
target detection
graph convolutional neural network
title Illation of Video Visual Relation Detection Based on Graph Neural Network
title_full Illation of Video Visual Relation Detection Based on Graph Neural Network
title_fullStr Illation of Video Visual Relation Detection Based on Graph Neural Network
title_full_unstemmed Illation of Video Visual Relation Detection Based on Graph Neural Network
title_short Illation of Video Visual Relation Detection Based on Graph Neural Network
title_sort illation of video visual relation detection based on graph neural network
topic Video visual relation detection
target detection
graph convolutional neural network
url https://ieeexplore.ieee.org/document/9547267/
work_keys_str_mv AT mingchengqu illationofvideovisualrelationdetectionbasedongraphneuralnetwork
AT jianxuncui illationofvideovisualrelationdetectionbasedongraphneuralnetwork
AT yuxinie illationofvideovisualrelationdetectionbasedongraphneuralnetwork
AT tonghuasu illationofvideovisualrelationdetectionbasedongraphneuralnetwork