Optimizing Continuous Prompts for Visual Relationship Detection by Affix-Tuning

Visual relationship detection is crucial for understanding visual scenes and is widely used in many areas, including visual navigation, visual question answering, and machine trouble detection. Traditional detection methods often fuse multiple region modules, which takes considerable time and resour...

Full description

Bibliographic Details
Main Authors: Shouguan Xiao, Weiping Fu
Format: Article
Language:English
Published: IEEE 2022-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9815128/
_version_ 1798040447197642752
author Shouguan Xiao
Weiping Fu
author_facet Shouguan Xiao
Weiping Fu
author_sort Shouguan Xiao
collection DOAJ
description Visual relationship detection is crucial for understanding visual scenes and is widely used in many areas, including visual navigation, visual question answering, and machine trouble detection. Traditional detection methods often fuse multiple region modules, which takes considerable time and resources to train every module with extensive samples. As every module is independent, the computation process has difficulty achieving unity and lacks a higher level of logical reasonability. In response to the above problems, we propose a novel method of affix-tuning transformers for visual relationship detection tasks, which keeps transformer model parameters frozen and optimizes a small continuous task-specific vector. It not only makes the model unified and reduces the training cost but also maintains the common-sense reasonability without multiscale training. In addition, we design a vision-and-language sentence expression prompt template and train a few transformer model parameters for downstream tasks. Our method, Prompt Template and Affix-Tuning Transformers (PTAT), is evaluated on visual relationship detection and Visual Genome datasets. Finally, the results of the proposed method are close to or even higher than those of the state-of-the-art methods on some evaluation metrics.
first_indexed 2024-04-11T22:07:42Z
format Article
id doaj.art-1e7fa7edb9554336ab561fd13eca3119
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-11T22:07:42Z
publishDate 2022-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-1e7fa7edb9554336ab561fd13eca31192022-12-22T04:00:38ZengIEEEIEEE Access2169-35362022-01-0110701047011210.1109/ACCESS.2022.31872639815128Optimizing Continuous Prompts for Visual Relationship Detection by Affix-TuningShouguan Xiao0https://orcid.org/0000-0002-7169-8114Weiping Fu1School of Mechanical and Precision Instrument Engineering, Xi’an University of Technology, Xi’an, ChinaSchool of Mechanical and Precision Instrument Engineering, Xi’an University of Technology, Xi’an, ChinaVisual relationship detection is crucial for understanding visual scenes and is widely used in many areas, including visual navigation, visual question answering, and machine trouble detection. Traditional detection methods often fuse multiple region modules, which takes considerable time and resources to train every module with extensive samples. As every module is independent, the computation process has difficulty achieving unity and lacks a higher level of logical reasonability. In response to the above problems, we propose a novel method of affix-tuning transformers for visual relationship detection tasks, which keeps transformer model parameters frozen and optimizes a small continuous task-specific vector. It not only makes the model unified and reduces the training cost but also maintains the common-sense reasonability without multiscale training. In addition, we design a vision-and-language sentence expression prompt template and train a few transformer model parameters for downstream tasks. Our method, Prompt Template and Affix-Tuning Transformers (PTAT), is evaluated on visual relationship detection and Visual Genome datasets. Finally, the results of the proposed method are close to or even higher than those of the state-of-the-art methods on some evaluation metrics.https://ieeexplore.ieee.org/document/9815128/Visual relationship detectionprompt templateaffix-tuning transformers
spellingShingle Shouguan Xiao
Weiping Fu
Optimizing Continuous Prompts for Visual Relationship Detection by Affix-Tuning
IEEE Access
Visual relationship detection
prompt template
affix-tuning transformers
title Optimizing Continuous Prompts for Visual Relationship Detection by Affix-Tuning
title_full Optimizing Continuous Prompts for Visual Relationship Detection by Affix-Tuning
title_fullStr Optimizing Continuous Prompts for Visual Relationship Detection by Affix-Tuning
title_full_unstemmed Optimizing Continuous Prompts for Visual Relationship Detection by Affix-Tuning
title_short Optimizing Continuous Prompts for Visual Relationship Detection by Affix-Tuning
title_sort optimizing continuous prompts for visual relationship detection by affix tuning
topic Visual relationship detection
prompt template
affix-tuning transformers
url https://ieeexplore.ieee.org/document/9815128/
work_keys_str_mv AT shouguanxiao optimizingcontinuouspromptsforvisualrelationshipdetectionbyaffixtuning
AT weipingfu optimizingcontinuouspromptsforvisualrelationshipdetectionbyaffixtuning