An efficient and accurate 2D human pose estimation method using VTTransPose network
Abstract Human pose estimation is a crucial area of study in computer vision. Transformer-based pose estimation algorithms have gained popularity for their excellent performance and relatively compact parameterization. However, these algorithms often face challenges including high computational dema...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2024-03-01
|
Series: | Scientific Reports |
Subjects: | |
Online Access: | https://doi.org/10.1038/s41598-024-58175-8 |
_version_ | 1797219814201098240 |
---|---|
author | Rui Li Qi Li Shiqiang Yang Xin Zeng An Yan |
author_facet | Rui Li Qi Li Shiqiang Yang Xin Zeng An Yan |
author_sort | Rui Li |
collection | DOAJ |
description | Abstract Human pose estimation is a crucial area of study in computer vision. Transformer-based pose estimation algorithms have gained popularity for their excellent performance and relatively compact parameterization. However, these algorithms often face challenges including high computational demands and insensitivity to local details. To address these problems, the Twin attention module was introduced in TransPose to improve model efficiency and reduce resource consumption. Additionally, to address issues related to insufficient joint feature representation and poor network recognition performance, the enhanced TransPose model, named VTTransPose, replaced the basic block in the third subnet with the intra-level feature fusion module V block. The performance of the proposed VTTransPose model was validated on the public datasets COCO val2017 and COCO test-dev2017. The experimental results on COCO val2017 and COCO test-dev2017 indicate that the AP evaluation index scores of the VTTransPose network proposed are 76.5 and 73.6 respectively, marking improvements of 0.4 and 0.2 over the original TransPose network. Additionally, VTTransPose exhibited a reduction of 4.8G FLOPs, 2M parameters, and approximately 40% lower memory usage during training compared to the original TransPose model. All the experimental results demonstrate that the proposed VTTransPose is more accurate, efficient, and lightweight compared to the original TransPose model. |
first_indexed | 2024-04-24T12:39:37Z |
format | Article |
id | doaj.art-9d92f30e1e3f421d86c07e68790e46ef |
institution | Directory Open Access Journal |
issn | 2045-2322 |
language | English |
last_indexed | 2024-04-24T12:39:37Z |
publishDate | 2024-03-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Reports |
spelling | doaj.art-9d92f30e1e3f421d86c07e68790e46ef2024-04-07T11:19:03ZengNature PortfolioScientific Reports2045-23222024-03-0114111210.1038/s41598-024-58175-8An efficient and accurate 2D human pose estimation method using VTTransPose networkRui Li0Qi Li1Shiqiang Yang2Xin Zeng3An Yan4School of Mechanical and Precision Instrument Engineering, Xi’an University of TechnologySchool of Mechanical and Precision Instrument Engineering, Xi’an University of TechnologySchool of Mechanical and Precision Instrument Engineering, Xi’an University of TechnologySchool of Mechanical and Precision Instrument Engineering, Xi’an University of TechnologySchool of Mechanical and Precision Instrument Engineering, Xi’an University of TechnologyAbstract Human pose estimation is a crucial area of study in computer vision. Transformer-based pose estimation algorithms have gained popularity for their excellent performance and relatively compact parameterization. However, these algorithms often face challenges including high computational demands and insensitivity to local details. To address these problems, the Twin attention module was introduced in TransPose to improve model efficiency and reduce resource consumption. Additionally, to address issues related to insufficient joint feature representation and poor network recognition performance, the enhanced TransPose model, named VTTransPose, replaced the basic block in the third subnet with the intra-level feature fusion module V block. The performance of the proposed VTTransPose model was validated on the public datasets COCO val2017 and COCO test-dev2017. The experimental results on COCO val2017 and COCO test-dev2017 indicate that the AP evaluation index scores of the VTTransPose network proposed are 76.5 and 73.6 respectively, marking improvements of 0.4 and 0.2 over the original TransPose network. Additionally, VTTransPose exhibited a reduction of 4.8G FLOPs, 2M parameters, and approximately 40% lower memory usage during training compared to the original TransPose model. All the experimental results demonstrate that the proposed VTTransPose is more accurate, efficient, and lightweight compared to the original TransPose model.https://doi.org/10.1038/s41598-024-58175-8Human pose estimationTransformerTwin attentionFeature fusion |
spellingShingle | Rui Li Qi Li Shiqiang Yang Xin Zeng An Yan An efficient and accurate 2D human pose estimation method using VTTransPose network Scientific Reports Human pose estimation Transformer Twin attention Feature fusion |
title | An efficient and accurate 2D human pose estimation method using VTTransPose network |
title_full | An efficient and accurate 2D human pose estimation method using VTTransPose network |
title_fullStr | An efficient and accurate 2D human pose estimation method using VTTransPose network |
title_full_unstemmed | An efficient and accurate 2D human pose estimation method using VTTransPose network |
title_short | An efficient and accurate 2D human pose estimation method using VTTransPose network |
title_sort | efficient and accurate 2d human pose estimation method using vttranspose network |
topic | Human pose estimation Transformer Twin attention Feature fusion |
url | https://doi.org/10.1038/s41598-024-58175-8 |
work_keys_str_mv | AT ruili anefficientandaccurate2dhumanposeestimationmethodusingvttransposenetwork AT qili anefficientandaccurate2dhumanposeestimationmethodusingvttransposenetwork AT shiqiangyang anefficientandaccurate2dhumanposeestimationmethodusingvttransposenetwork AT xinzeng anefficientandaccurate2dhumanposeestimationmethodusingvttransposenetwork AT anyan anefficientandaccurate2dhumanposeestimationmethodusingvttransposenetwork AT ruili efficientandaccurate2dhumanposeestimationmethodusingvttransposenetwork AT qili efficientandaccurate2dhumanposeestimationmethodusingvttransposenetwork AT shiqiangyang efficientandaccurate2dhumanposeestimationmethodusingvttransposenetwork AT xinzeng efficientandaccurate2dhumanposeestimationmethodusingvttransposenetwork AT anyan efficientandaccurate2dhumanposeestimationmethodusingvttransposenetwork |