Cofopose: Conditional 2D Pose Estimation with Transformers

Human pose estimation has long been a fundamental problem in computer vision and artificial intelligence. Prominent among the 2D human pose estimation (HPE) methods are the regression-based approaches, which have been proven to achieve excellent results. However, the ground-truth labels are usually...

Full description

Bibliographic Details
Main Authors:	Evans Aidoo, Xun Wang, Zhenguang Liu, Edwin Kwadwo Tenagyei, Kwabena Owusu-Agyemang, Seth Larweh Kodjiku, Victor Nonso Ejianya, Esther Stacy E. B. Aggrey
Format:	Article
Language:	English
Published:	MDPI AG 2022-09-01
Series:	Sensors
Subjects:	DETR human pose estimation conditional DETR convolutional neural network (CNN) detection
Online Access:	https://www.mdpi.com/1424-8220/22/18/6821

_version_	1797482632482652160
author	Evans Aidoo Xun Wang Zhenguang Liu Edwin Kwadwo Tenagyei Kwabena Owusu-Agyemang Seth Larweh Kodjiku Victor Nonso Ejianya Esther Stacy E. B. Aggrey
author_facet	Evans Aidoo Xun Wang Zhenguang Liu Edwin Kwadwo Tenagyei Kwabena Owusu-Agyemang Seth Larweh Kodjiku Victor Nonso Ejianya Esther Stacy E. B. Aggrey
author_sort	Evans Aidoo
collection	DOAJ
description	Human pose estimation has long been a fundamental problem in computer vision and artificial intelligence. Prominent among the 2D human pose estimation (HPE) methods are the regression-based approaches, which have been proven to achieve excellent results. However, the ground-truth labels are usually inherently ambiguous in challenging cases such as motion blur, occlusions, and truncation, leading to poor performance measurement and lower levels of accuracy. In this paper, we propose Cofopose, which is a two-stage approach consisting of a person and keypoint detection transformers for 2D human pose estimation. Cofopose is composed of conditional cross-attention, a conditional DEtection TRansformer (conditional DETR), and an encoder-decoder in the transformer framework; this allows it to achieve person and keypoint detection. In a significant departure from other approaches, we use conditional cross-attention and fine-tune conditional DETR for our person detection, and encoder-decoders in the transformers for our keypoint detection. Cofopose was extensively evaluated using two benchmark datasets, MS COCO and MPII, achieving an improved performance with significant margins over the existing state-of-the-art frameworks.
first_indexed	2024-03-09T22:35:13Z
format	Article
id	doaj.art-cfa0639a61f647e6936a0cad8c34a11e
institution	Directory Open Access Journal
issn	1424-8220
language	English
last_indexed	2024-03-09T22:35:13Z
publishDate	2022-09-01
publisher	MDPI AG
record_format	Article
series	Sensors
spelling	doaj.art-cfa0639a61f647e6936a0cad8c34a11e2023-11-23T18:49:50ZengMDPI AGSensors1424-82202022-09-012218682110.3390/s22186821Cofopose: Conditional 2D Pose Estimation with TransformersEvans Aidoo0Xun Wang1Zhenguang Liu2Edwin Kwadwo Tenagyei3Kwabena Owusu-Agyemang4Seth Larweh Kodjiku5Victor Nonso Ejianya6Esther Stacy E. B. Aggrey7School of Computer & Information Engineering, Zhejiang Gongshang University, Hangzhou 310018, ChinaSchool of Computer & Information Engineering, Zhejiang Gongshang University, Hangzhou 310018, ChinaSchool of Computer & Information Engineering, Zhejiang Gongshang University, Hangzhou 310018, ChinaSchool of Information & Software Engineering, University of Electronic Science & Technology of China, Chengdu 611731, ChinaDepartment of Computer Science, Kwame Nkrumah University of Science and Technology (KNUST), Kumasi 03220, GhanaSchool of Computer & Information Engineering, Zhejiang Gongshang University, Hangzhou 310018, ChinaSchool of Computer & Information Engineering, Zhejiang Gongshang University, Hangzhou 310018, ChinaSchool of Information & Software Engineering, University of Electronic Science & Technology of China, Chengdu 611731, ChinaHuman pose estimation has long been a fundamental problem in computer vision and artificial intelligence. Prominent among the 2D human pose estimation (HPE) methods are the regression-based approaches, which have been proven to achieve excellent results. However, the ground-truth labels are usually inherently ambiguous in challenging cases such as motion blur, occlusions, and truncation, leading to poor performance measurement and lower levels of accuracy. In this paper, we propose Cofopose, which is a two-stage approach consisting of a person and keypoint detection transformers for 2D human pose estimation. Cofopose is composed of conditional cross-attention, a conditional DEtection TRansformer (conditional DETR), and an encoder-decoder in the transformer framework; this allows it to achieve person and keypoint detection. In a significant departure from other approaches, we use conditional cross-attention and fine-tune conditional DETR for our person detection, and encoder-decoders in the transformers for our keypoint detection. Cofopose was extensively evaluated using two benchmark datasets, MS COCO and MPII, achieving an improved performance with significant margins over the existing state-of-the-art frameworks.https://www.mdpi.com/1424-8220/22/18/6821DETRhuman pose estimationconditional DETRconvolutional neural network (CNN)detection
spellingShingle	Evans Aidoo Xun Wang Zhenguang Liu Edwin Kwadwo Tenagyei Kwabena Owusu-Agyemang Seth Larweh Kodjiku Victor Nonso Ejianya Esther Stacy E. B. Aggrey Cofopose: Conditional 2D Pose Estimation with Transformers Sensors DETR human pose estimation conditional DETR convolutional neural network (CNN) detection
title	Cofopose: Conditional 2D Pose Estimation with Transformers
title_full	Cofopose: Conditional 2D Pose Estimation with Transformers
title_fullStr	Cofopose: Conditional 2D Pose Estimation with Transformers
title_full_unstemmed	Cofopose: Conditional 2D Pose Estimation with Transformers
title_short	Cofopose: Conditional 2D Pose Estimation with Transformers
title_sort	cofopose conditional 2d pose estimation with transformers
topic	DETR human pose estimation conditional DETR convolutional neural network (CNN) detection
url	https://www.mdpi.com/1424-8220/22/18/6821
work_keys_str_mv	AT evansaidoo cofoposeconditional2dposeestimationwithtransformers AT xunwang cofoposeconditional2dposeestimationwithtransformers AT zhenguangliu cofoposeconditional2dposeestimationwithtransformers AT edwinkwadwotenagyei cofoposeconditional2dposeestimationwithtransformers AT kwabenaowusuagyemang cofoposeconditional2dposeestimationwithtransformers AT sethlarwehkodjiku cofoposeconditional2dposeestimationwithtransformers AT victornonsoejianya cofoposeconditional2dposeestimationwithtransformers AT estherstacyebaggrey cofoposeconditional2dposeestimationwithtransformers

Cofopose: Conditional 2D Pose Estimation with Transformers

Similar Items