An efficient and accurate 2D human pose estimation method using VTTransPose network

Abstract Human pose estimation is a crucial area of study in computer vision. Transformer-based pose estimation algorithms have gained popularity for their excellent performance and relatively compact parameterization. However, these algorithms often face challenges including high computational dema...

Full description

Bibliographic Details
Main Authors: Rui Li, Qi Li, Shiqiang Yang, Xin Zeng, An Yan
Format: Article
Language:English
Published: Nature Portfolio 2024-03-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-024-58175-8
_version_ 1797219814201098240
author Rui Li
Qi Li
Shiqiang Yang
Xin Zeng
An Yan
author_facet Rui Li
Qi Li
Shiqiang Yang
Xin Zeng
An Yan
author_sort Rui Li
collection DOAJ
description Abstract Human pose estimation is a crucial area of study in computer vision. Transformer-based pose estimation algorithms have gained popularity for their excellent performance and relatively compact parameterization. However, these algorithms often face challenges including high computational demands and insensitivity to local details. To address these problems, the Twin attention module was introduced in TransPose to improve model efficiency and reduce resource consumption. Additionally, to address issues related to insufficient joint feature representation and poor network recognition performance, the enhanced TransPose model, named VTTransPose, replaced the basic block in the third subnet with the intra-level feature fusion module V block. The performance of the proposed VTTransPose model was validated on the public datasets COCO val2017 and COCO test-dev2017. The experimental results on COCO val2017 and COCO test-dev2017 indicate that the AP evaluation index scores of the VTTransPose network proposed are 76.5 and 73.6 respectively, marking improvements of 0.4 and 0.2 over the original TransPose network. Additionally, VTTransPose exhibited a reduction of 4.8G FLOPs, 2M parameters, and approximately 40% lower memory usage during training compared to the original TransPose model. All the experimental results demonstrate that the proposed VTTransPose is more accurate, efficient, and lightweight compared to the original TransPose model.
first_indexed 2024-04-24T12:39:37Z
format Article
id doaj.art-9d92f30e1e3f421d86c07e68790e46ef
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-04-24T12:39:37Z
publishDate 2024-03-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-9d92f30e1e3f421d86c07e68790e46ef2024-04-07T11:19:03ZengNature PortfolioScientific Reports2045-23222024-03-0114111210.1038/s41598-024-58175-8An efficient and accurate 2D human pose estimation method using VTTransPose networkRui Li0Qi Li1Shiqiang Yang2Xin Zeng3An Yan4School of Mechanical and Precision Instrument Engineering, Xi’an University of TechnologySchool of Mechanical and Precision Instrument Engineering, Xi’an University of TechnologySchool of Mechanical and Precision Instrument Engineering, Xi’an University of TechnologySchool of Mechanical and Precision Instrument Engineering, Xi’an University of TechnologySchool of Mechanical and Precision Instrument Engineering, Xi’an University of TechnologyAbstract Human pose estimation is a crucial area of study in computer vision. Transformer-based pose estimation algorithms have gained popularity for their excellent performance and relatively compact parameterization. However, these algorithms often face challenges including high computational demands and insensitivity to local details. To address these problems, the Twin attention module was introduced in TransPose to improve model efficiency and reduce resource consumption. Additionally, to address issues related to insufficient joint feature representation and poor network recognition performance, the enhanced TransPose model, named VTTransPose, replaced the basic block in the third subnet with the intra-level feature fusion module V block. The performance of the proposed VTTransPose model was validated on the public datasets COCO val2017 and COCO test-dev2017. The experimental results on COCO val2017 and COCO test-dev2017 indicate that the AP evaluation index scores of the VTTransPose network proposed are 76.5 and 73.6 respectively, marking improvements of 0.4 and 0.2 over the original TransPose network. Additionally, VTTransPose exhibited a reduction of 4.8G FLOPs, 2M parameters, and approximately 40% lower memory usage during training compared to the original TransPose model. All the experimental results demonstrate that the proposed VTTransPose is more accurate, efficient, and lightweight compared to the original TransPose model.https://doi.org/10.1038/s41598-024-58175-8Human pose estimationTransformerTwin attentionFeature fusion
spellingShingle Rui Li
Qi Li
Shiqiang Yang
Xin Zeng
An Yan
An efficient and accurate 2D human pose estimation method using VTTransPose network
Scientific Reports
Human pose estimation
Transformer
Twin attention
Feature fusion
title An efficient and accurate 2D human pose estimation method using VTTransPose network
title_full An efficient and accurate 2D human pose estimation method using VTTransPose network
title_fullStr An efficient and accurate 2D human pose estimation method using VTTransPose network
title_full_unstemmed An efficient and accurate 2D human pose estimation method using VTTransPose network
title_short An efficient and accurate 2D human pose estimation method using VTTransPose network
title_sort efficient and accurate 2d human pose estimation method using vttranspose network
topic Human pose estimation
Transformer
Twin attention
Feature fusion
url https://doi.org/10.1038/s41598-024-58175-8
work_keys_str_mv AT ruili anefficientandaccurate2dhumanposeestimationmethodusingvttransposenetwork
AT qili anefficientandaccurate2dhumanposeestimationmethodusingvttransposenetwork
AT shiqiangyang anefficientandaccurate2dhumanposeestimationmethodusingvttransposenetwork
AT xinzeng anefficientandaccurate2dhumanposeestimationmethodusingvttransposenetwork
AT anyan anefficientandaccurate2dhumanposeestimationmethodusingvttransposenetwork
AT ruili efficientandaccurate2dhumanposeestimationmethodusingvttransposenetwork
AT qili efficientandaccurate2dhumanposeestimationmethodusingvttransposenetwork
AT shiqiangyang efficientandaccurate2dhumanposeestimationmethodusingvttransposenetwork
AT xinzeng efficientandaccurate2dhumanposeestimationmethodusingvttransposenetwork
AT anyan efficientandaccurate2dhumanposeestimationmethodusingvttransposenetwork