Cofopose: Conditional 2D Pose Estimation with Transformers
Human pose estimation has long been a fundamental problem in computer vision and artificial intelligence. Prominent among the 2D human pose estimation (HPE) methods are the regression-based approaches, which have been proven to achieve excellent results. However, the ground-truth labels are usually...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-09-01
|
Series: | Sensors |
Subjects: | |
Online Access: | https://www.mdpi.com/1424-8220/22/18/6821 |
_version_ | 1797482632482652160 |
---|---|
author | Evans Aidoo Xun Wang Zhenguang Liu Edwin Kwadwo Tenagyei Kwabena Owusu-Agyemang Seth Larweh Kodjiku Victor Nonso Ejianya Esther Stacy E. B. Aggrey |
author_facet | Evans Aidoo Xun Wang Zhenguang Liu Edwin Kwadwo Tenagyei Kwabena Owusu-Agyemang Seth Larweh Kodjiku Victor Nonso Ejianya Esther Stacy E. B. Aggrey |
author_sort | Evans Aidoo |
collection | DOAJ |
description | Human pose estimation has long been a fundamental problem in computer vision and artificial intelligence. Prominent among the 2D human pose estimation (HPE) methods are the regression-based approaches, which have been proven to achieve excellent results. However, the ground-truth labels are usually inherently ambiguous in challenging cases such as motion blur, occlusions, and truncation, leading to poor performance measurement and lower levels of accuracy. In this paper, we propose Cofopose, which is a two-stage approach consisting of a person and keypoint detection transformers for 2D human pose estimation. Cofopose is composed of conditional cross-attention, a conditional DEtection TRansformer (conditional DETR), and an encoder-decoder in the transformer framework; this allows it to achieve person and keypoint detection. In a significant departure from other approaches, we use conditional cross-attention and fine-tune conditional DETR for our person detection, and encoder-decoders in the transformers for our keypoint detection. Cofopose was extensively evaluated using two benchmark datasets, MS COCO and MPII, achieving an improved performance with significant margins over the existing state-of-the-art frameworks. |
first_indexed | 2024-03-09T22:35:13Z |
format | Article |
id | doaj.art-cfa0639a61f647e6936a0cad8c34a11e |
institution | Directory Open Access Journal |
issn | 1424-8220 |
language | English |
last_indexed | 2024-03-09T22:35:13Z |
publishDate | 2022-09-01 |
publisher | MDPI AG |
record_format | Article |
series | Sensors |
spelling | doaj.art-cfa0639a61f647e6936a0cad8c34a11e2023-11-23T18:49:50ZengMDPI AGSensors1424-82202022-09-012218682110.3390/s22186821Cofopose: Conditional 2D Pose Estimation with TransformersEvans Aidoo0Xun Wang1Zhenguang Liu2Edwin Kwadwo Tenagyei3Kwabena Owusu-Agyemang4Seth Larweh Kodjiku5Victor Nonso Ejianya6Esther Stacy E. B. Aggrey7School of Computer & Information Engineering, Zhejiang Gongshang University, Hangzhou 310018, ChinaSchool of Computer & Information Engineering, Zhejiang Gongshang University, Hangzhou 310018, ChinaSchool of Computer & Information Engineering, Zhejiang Gongshang University, Hangzhou 310018, ChinaSchool of Information & Software Engineering, University of Electronic Science & Technology of China, Chengdu 611731, ChinaDepartment of Computer Science, Kwame Nkrumah University of Science and Technology (KNUST), Kumasi 03220, GhanaSchool of Computer & Information Engineering, Zhejiang Gongshang University, Hangzhou 310018, ChinaSchool of Computer & Information Engineering, Zhejiang Gongshang University, Hangzhou 310018, ChinaSchool of Information & Software Engineering, University of Electronic Science & Technology of China, Chengdu 611731, ChinaHuman pose estimation has long been a fundamental problem in computer vision and artificial intelligence. Prominent among the 2D human pose estimation (HPE) methods are the regression-based approaches, which have been proven to achieve excellent results. However, the ground-truth labels are usually inherently ambiguous in challenging cases such as motion blur, occlusions, and truncation, leading to poor performance measurement and lower levels of accuracy. In this paper, we propose Cofopose, which is a two-stage approach consisting of a person and keypoint detection transformers for 2D human pose estimation. Cofopose is composed of conditional cross-attention, a conditional DEtection TRansformer (conditional DETR), and an encoder-decoder in the transformer framework; this allows it to achieve person and keypoint detection. In a significant departure from other approaches, we use conditional cross-attention and fine-tune conditional DETR for our person detection, and encoder-decoders in the transformers for our keypoint detection. Cofopose was extensively evaluated using two benchmark datasets, MS COCO and MPII, achieving an improved performance with significant margins over the existing state-of-the-art frameworks.https://www.mdpi.com/1424-8220/22/18/6821DETRhuman pose estimationconditional DETRconvolutional neural network (CNN)detection |
spellingShingle | Evans Aidoo Xun Wang Zhenguang Liu Edwin Kwadwo Tenagyei Kwabena Owusu-Agyemang Seth Larweh Kodjiku Victor Nonso Ejianya Esther Stacy E. B. Aggrey Cofopose: Conditional 2D Pose Estimation with Transformers Sensors DETR human pose estimation conditional DETR convolutional neural network (CNN) detection |
title | Cofopose: Conditional 2D Pose Estimation with Transformers |
title_full | Cofopose: Conditional 2D Pose Estimation with Transformers |
title_fullStr | Cofopose: Conditional 2D Pose Estimation with Transformers |
title_full_unstemmed | Cofopose: Conditional 2D Pose Estimation with Transformers |
title_short | Cofopose: Conditional 2D Pose Estimation with Transformers |
title_sort | cofopose conditional 2d pose estimation with transformers |
topic | DETR human pose estimation conditional DETR convolutional neural network (CNN) detection |
url | https://www.mdpi.com/1424-8220/22/18/6821 |
work_keys_str_mv | AT evansaidoo cofoposeconditional2dposeestimationwithtransformers AT xunwang cofoposeconditional2dposeestimationwithtransformers AT zhenguangliu cofoposeconditional2dposeestimationwithtransformers AT edwinkwadwotenagyei cofoposeconditional2dposeestimationwithtransformers AT kwabenaowusuagyemang cofoposeconditional2dposeestimationwithtransformers AT sethlarwehkodjiku cofoposeconditional2dposeestimationwithtransformers AT victornonsoejianya cofoposeconditional2dposeestimationwithtransformers AT estherstacyebaggrey cofoposeconditional2dposeestimationwithtransformers |