Top-Down System for Multi-Person 3D Absolute Pose Estimation from Monocular Videos
Two-dimensional (2D) multi-person pose estimation and three-dimensional (3D) root-relative pose estimation from a monocular RGB camera have made significant progress recently. Yet, real-world applications require depth estimations and the ability to determine the distances between people in a scene....
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-05-01
|
Series: | Sensors |
Subjects: | |
Online Access: | https://www.mdpi.com/1424-8220/22/11/4109 |
_version_ | 1797491692390055936 |
---|---|
author | Amal El Kaid Denis Brazey Vincent Barra Karim Baïna |
author_facet | Amal El Kaid Denis Brazey Vincent Barra Karim Baïna |
author_sort | Amal El Kaid |
collection | DOAJ |
description | Two-dimensional (2D) multi-person pose estimation and three-dimensional (3D) root-relative pose estimation from a monocular RGB camera have made significant progress recently. Yet, real-world applications require depth estimations and the ability to determine the distances between people in a scene. Therefore, it is necessary to recover the 3D absolute poses of several people. However, this is still a challenge when using cameras from single points of view. Furthermore, the previously proposed systems typically required a significant amount of resources and memory. To overcome these restrictions, we herein propose a real-time framework for multi-person 3D absolute pose estimation from a monocular camera, which integrates a human detector, a 2D pose estimator, a 3D root-relative pose reconstructor, and a root depth estimator in a top-down manner. The proposed system, called Root-GAST-Net, is based on modified versions of GAST-Net and RootNet networks. The efficiency of the proposed Root-GAST-Net system is demonstrated through quantitative and qualitative evaluations on two benchmark datasets, Human3.6M and MuPoTS-3D. On all evaluated metrics, our experimental results on the MuPoTS-3D dataset outperform the current state-of-the-art by a significant margin, and can run in real-time at 15 fps on the Nvidia GeForce GTX 1080. |
first_indexed | 2024-03-10T00:52:59Z |
format | Article |
id | doaj.art-7f5470321fa449cbb02bedacfec01d48 |
institution | Directory Open Access Journal |
issn | 1424-8220 |
language | English |
last_indexed | 2024-03-10T00:52:59Z |
publishDate | 2022-05-01 |
publisher | MDPI AG |
record_format | Article |
series | Sensors |
spelling | doaj.art-7f5470321fa449cbb02bedacfec01d482023-11-23T14:48:45ZengMDPI AGSensors1424-82202022-05-012211410910.3390/s22114109Top-Down System for Multi-Person 3D Absolute Pose Estimation from Monocular VideosAmal El Kaid0Denis Brazey1Vincent Barra2Karim Baïna3Université Clermont-Auvergne, CNRS, Mines de Saint-Étienne, Clermont-Auvergne-INP, LIMOS, 63000 Clermont-Ferrand, FranceSociété Prynel, RD974, 21190 Corpeau, FranceUniversité Clermont-Auvergne, CNRS, Mines de Saint-Étienne, Clermont-Auvergne-INP, LIMOS, 63000 Clermont-Ferrand, FranceAlqualsadi Research Team, Rabat IT Center, ENSIAS, Mohammed V University in Rabat, Rabat 10112, MoroccoTwo-dimensional (2D) multi-person pose estimation and three-dimensional (3D) root-relative pose estimation from a monocular RGB camera have made significant progress recently. Yet, real-world applications require depth estimations and the ability to determine the distances between people in a scene. Therefore, it is necessary to recover the 3D absolute poses of several people. However, this is still a challenge when using cameras from single points of view. Furthermore, the previously proposed systems typically required a significant amount of resources and memory. To overcome these restrictions, we herein propose a real-time framework for multi-person 3D absolute pose estimation from a monocular camera, which integrates a human detector, a 2D pose estimator, a 3D root-relative pose reconstructor, and a root depth estimator in a top-down manner. The proposed system, called Root-GAST-Net, is based on modified versions of GAST-Net and RootNet networks. The efficiency of the proposed Root-GAST-Net system is demonstrated through quantitative and qualitative evaluations on two benchmark datasets, Human3.6M and MuPoTS-3D. On all evaluated metrics, our experimental results on the MuPoTS-3D dataset outperform the current state-of-the-art by a significant margin, and can run in real-time at 15 fps on the Nvidia GeForce GTX 1080.https://www.mdpi.com/1424-8220/22/11/41093D multi-person pose estimationabsolute posescamera-centric coordinatescomputer visionartificial intelligencedeep-learning |
spellingShingle | Amal El Kaid Denis Brazey Vincent Barra Karim Baïna Top-Down System for Multi-Person 3D Absolute Pose Estimation from Monocular Videos Sensors 3D multi-person pose estimation absolute poses camera-centric coordinates computer vision artificial intelligence deep-learning |
title | Top-Down System for Multi-Person 3D Absolute Pose Estimation from Monocular Videos |
title_full | Top-Down System for Multi-Person 3D Absolute Pose Estimation from Monocular Videos |
title_fullStr | Top-Down System for Multi-Person 3D Absolute Pose Estimation from Monocular Videos |
title_full_unstemmed | Top-Down System for Multi-Person 3D Absolute Pose Estimation from Monocular Videos |
title_short | Top-Down System for Multi-Person 3D Absolute Pose Estimation from Monocular Videos |
title_sort | top down system for multi person 3d absolute pose estimation from monocular videos |
topic | 3D multi-person pose estimation absolute poses camera-centric coordinates computer vision artificial intelligence deep-learning |
url | https://www.mdpi.com/1424-8220/22/11/4109 |
work_keys_str_mv | AT amalelkaid topdownsystemformultiperson3dabsoluteposeestimationfrommonocularvideos AT denisbrazey topdownsystemformultiperson3dabsoluteposeestimationfrommonocularvideos AT vincentbarra topdownsystemformultiperson3dabsoluteposeestimationfrommonocularvideos AT karimbaina topdownsystemformultiperson3dabsoluteposeestimationfrommonocularvideos |