Top-Down System for Multi-Person 3D Absolute Pose Estimation from Monocular Videos

Two-dimensional (2D) multi-person pose estimation and three-dimensional (3D) root-relative pose estimation from a monocular RGB camera have made significant progress recently. Yet, real-world applications require depth estimations and the ability to determine the distances between people in a scene....

Full description

Bibliographic Details
Main Authors: Amal El Kaid, Denis Brazey, Vincent Barra, Karim Baïna
Format: Article
Language:English
Published: MDPI AG 2022-05-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/22/11/4109
_version_ 1797491692390055936
author Amal El Kaid
Denis Brazey
Vincent Barra
Karim Baïna
author_facet Amal El Kaid
Denis Brazey
Vincent Barra
Karim Baïna
author_sort Amal El Kaid
collection DOAJ
description Two-dimensional (2D) multi-person pose estimation and three-dimensional (3D) root-relative pose estimation from a monocular RGB camera have made significant progress recently. Yet, real-world applications require depth estimations and the ability to determine the distances between people in a scene. Therefore, it is necessary to recover the 3D absolute poses of several people. However, this is still a challenge when using cameras from single points of view. Furthermore, the previously proposed systems typically required a significant amount of resources and memory. To overcome these restrictions, we herein propose a real-time framework for multi-person 3D absolute pose estimation from a monocular camera, which integrates a human detector, a 2D pose estimator, a 3D root-relative pose reconstructor, and a root depth estimator in a top-down manner. The proposed system, called Root-GAST-Net, is based on modified versions of GAST-Net and RootNet networks. The efficiency of the proposed Root-GAST-Net system is demonstrated through quantitative and qualitative evaluations on two benchmark datasets, Human3.6M and MuPoTS-3D. On all evaluated metrics, our experimental results on the MuPoTS-3D dataset outperform the current state-of-the-art by a significant margin, and can run in real-time at 15 fps on the Nvidia GeForce GTX 1080.
first_indexed 2024-03-10T00:52:59Z
format Article
id doaj.art-7f5470321fa449cbb02bedacfec01d48
institution Directory Open Access Journal
issn 1424-8220
language English
last_indexed 2024-03-10T00:52:59Z
publishDate 2022-05-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj.art-7f5470321fa449cbb02bedacfec01d482023-11-23T14:48:45ZengMDPI AGSensors1424-82202022-05-012211410910.3390/s22114109Top-Down System for Multi-Person 3D Absolute Pose Estimation from Monocular VideosAmal El Kaid0Denis Brazey1Vincent Barra2Karim Baïna3Université Clermont-Auvergne, CNRS, Mines de Saint-Étienne, Clermont-Auvergne-INP, LIMOS, 63000 Clermont-Ferrand, FranceSociété Prynel, RD974, 21190 Corpeau, FranceUniversité Clermont-Auvergne, CNRS, Mines de Saint-Étienne, Clermont-Auvergne-INP, LIMOS, 63000 Clermont-Ferrand, FranceAlqualsadi Research Team, Rabat IT Center, ENSIAS, Mohammed V University in Rabat, Rabat 10112, MoroccoTwo-dimensional (2D) multi-person pose estimation and three-dimensional (3D) root-relative pose estimation from a monocular RGB camera have made significant progress recently. Yet, real-world applications require depth estimations and the ability to determine the distances between people in a scene. Therefore, it is necessary to recover the 3D absolute poses of several people. However, this is still a challenge when using cameras from single points of view. Furthermore, the previously proposed systems typically required a significant amount of resources and memory. To overcome these restrictions, we herein propose a real-time framework for multi-person 3D absolute pose estimation from a monocular camera, which integrates a human detector, a 2D pose estimator, a 3D root-relative pose reconstructor, and a root depth estimator in a top-down manner. The proposed system, called Root-GAST-Net, is based on modified versions of GAST-Net and RootNet networks. The efficiency of the proposed Root-GAST-Net system is demonstrated through quantitative and qualitative evaluations on two benchmark datasets, Human3.6M and MuPoTS-3D. On all evaluated metrics, our experimental results on the MuPoTS-3D dataset outperform the current state-of-the-art by a significant margin, and can run in real-time at 15 fps on the Nvidia GeForce GTX 1080.https://www.mdpi.com/1424-8220/22/11/41093D multi-person pose estimationabsolute posescamera-centric coordinatescomputer visionartificial intelligencedeep-learning
spellingShingle Amal El Kaid
Denis Brazey
Vincent Barra
Karim Baïna
Top-Down System for Multi-Person 3D Absolute Pose Estimation from Monocular Videos
Sensors
3D multi-person pose estimation
absolute poses
camera-centric coordinates
computer vision
artificial intelligence
deep-learning
title Top-Down System for Multi-Person 3D Absolute Pose Estimation from Monocular Videos
title_full Top-Down System for Multi-Person 3D Absolute Pose Estimation from Monocular Videos
title_fullStr Top-Down System for Multi-Person 3D Absolute Pose Estimation from Monocular Videos
title_full_unstemmed Top-Down System for Multi-Person 3D Absolute Pose Estimation from Monocular Videos
title_short Top-Down System for Multi-Person 3D Absolute Pose Estimation from Monocular Videos
title_sort top down system for multi person 3d absolute pose estimation from monocular videos
topic 3D multi-person pose estimation
absolute poses
camera-centric coordinates
computer vision
artificial intelligence
deep-learning
url https://www.mdpi.com/1424-8220/22/11/4109
work_keys_str_mv AT amalelkaid topdownsystemformultiperson3dabsoluteposeestimationfrommonocularvideos
AT denisbrazey topdownsystemformultiperson3dabsoluteposeestimationfrommonocularvideos
AT vincentbarra topdownsystemformultiperson3dabsoluteposeestimationfrommonocularvideos
AT karimbaina topdownsystemformultiperson3dabsoluteposeestimationfrommonocularvideos