Optimizing Continuous Prompts for Visual Relationship Detection by Affix-Tuning
Visual relationship detection is crucial for understanding visual scenes and is widely used in many areas, including visual navigation, visual question answering, and machine trouble detection. Traditional detection methods often fuse multiple region modules, which takes considerable time and resour...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2022-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9815128/ |
_version_ | 1798040447197642752 |
---|---|
author | Shouguan Xiao Weiping Fu |
author_facet | Shouguan Xiao Weiping Fu |
author_sort | Shouguan Xiao |
collection | DOAJ |
description | Visual relationship detection is crucial for understanding visual scenes and is widely used in many areas, including visual navigation, visual question answering, and machine trouble detection. Traditional detection methods often fuse multiple region modules, which takes considerable time and resources to train every module with extensive samples. As every module is independent, the computation process has difficulty achieving unity and lacks a higher level of logical reasonability. In response to the above problems, we propose a novel method of affix-tuning transformers for visual relationship detection tasks, which keeps transformer model parameters frozen and optimizes a small continuous task-specific vector. It not only makes the model unified and reduces the training cost but also maintains the common-sense reasonability without multiscale training. In addition, we design a vision-and-language sentence expression prompt template and train a few transformer model parameters for downstream tasks. Our method, Prompt Template and Affix-Tuning Transformers (PTAT), is evaluated on visual relationship detection and Visual Genome datasets. Finally, the results of the proposed method are close to or even higher than those of the state-of-the-art methods on some evaluation metrics. |
first_indexed | 2024-04-11T22:07:42Z |
format | Article |
id | doaj.art-1e7fa7edb9554336ab561fd13eca3119 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-04-11T22:07:42Z |
publishDate | 2022-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-1e7fa7edb9554336ab561fd13eca31192022-12-22T04:00:38ZengIEEEIEEE Access2169-35362022-01-0110701047011210.1109/ACCESS.2022.31872639815128Optimizing Continuous Prompts for Visual Relationship Detection by Affix-TuningShouguan Xiao0https://orcid.org/0000-0002-7169-8114Weiping Fu1School of Mechanical and Precision Instrument Engineering, Xi’an University of Technology, Xi’an, ChinaSchool of Mechanical and Precision Instrument Engineering, Xi’an University of Technology, Xi’an, ChinaVisual relationship detection is crucial for understanding visual scenes and is widely used in many areas, including visual navigation, visual question answering, and machine trouble detection. Traditional detection methods often fuse multiple region modules, which takes considerable time and resources to train every module with extensive samples. As every module is independent, the computation process has difficulty achieving unity and lacks a higher level of logical reasonability. In response to the above problems, we propose a novel method of affix-tuning transformers for visual relationship detection tasks, which keeps transformer model parameters frozen and optimizes a small continuous task-specific vector. It not only makes the model unified and reduces the training cost but also maintains the common-sense reasonability without multiscale training. In addition, we design a vision-and-language sentence expression prompt template and train a few transformer model parameters for downstream tasks. Our method, Prompt Template and Affix-Tuning Transformers (PTAT), is evaluated on visual relationship detection and Visual Genome datasets. Finally, the results of the proposed method are close to or even higher than those of the state-of-the-art methods on some evaluation metrics.https://ieeexplore.ieee.org/document/9815128/Visual relationship detectionprompt templateaffix-tuning transformers |
spellingShingle | Shouguan Xiao Weiping Fu Optimizing Continuous Prompts for Visual Relationship Detection by Affix-Tuning IEEE Access Visual relationship detection prompt template affix-tuning transformers |
title | Optimizing Continuous Prompts for Visual Relationship Detection by Affix-Tuning |
title_full | Optimizing Continuous Prompts for Visual Relationship Detection by Affix-Tuning |
title_fullStr | Optimizing Continuous Prompts for Visual Relationship Detection by Affix-Tuning |
title_full_unstemmed | Optimizing Continuous Prompts for Visual Relationship Detection by Affix-Tuning |
title_short | Optimizing Continuous Prompts for Visual Relationship Detection by Affix-Tuning |
title_sort | optimizing continuous prompts for visual relationship detection by affix tuning |
topic | Visual relationship detection prompt template affix-tuning transformers |
url | https://ieeexplore.ieee.org/document/9815128/ |
work_keys_str_mv | AT shouguanxiao optimizingcontinuouspromptsforvisualrelationshipdetectionbyaffixtuning AT weipingfu optimizingcontinuouspromptsforvisualrelationshipdetectionbyaffixtuning |