Monocular camera-based 3D human body pose estimation by Generative Adversarial Network considering joint range of motion represented by quaternion

This study aims to establish a monocular camera-based system that uses a generative adversarial network (GAN) to estimate a 3D pose of a human and his/her orientation relative to the camera from images by considering anatomical knowledge such as segment length and joint range of motion. The proposed...

Full description

Bibliographic Details
Main Authors: Akisue KURAMOTO, Kosuke MIZUKOSHI, Motomu NAKASHIMA
Format: Article
Language:English
Published: The Japan Society of Mechanical Engineers 2023-01-01
Series:Journal of Biomechanical Science and Engineering
Subjects:
Online Access:https://www.jstage.jst.go.jp/article/jbse/18/2/18_22-00305/_pdf/-char/en
Description
Summary:This study aims to establish a monocular camera-based system that uses a generative adversarial network (GAN) to estimate a 3D pose of a human and his/her orientation relative to the camera from images by considering anatomical knowledge such as segment length and joint range of motion. The proposed network is trained by unsupervised learning using only 2D joint positions as training data, i.e., does not require the ground truth data on 3D joint positions and angles in real space for training. Unsupervised learning of the proposed network was performed based on a new loss function consisting of the typical GAN loss function and three new terms, which provide constraints on the quaternion norm, joint range of motion, and the similarity between the real and fake 2D poses, respectively. Numerical validation was performed using the Human3.6M human pose dataset, which includes real-space measured images, 2D pose, and 3D joint angles and positions. The results show that the proposed network is slightly less accurate than the depth estimation method obtained by supervised learning, but is as accurate as the depth estimation method obtained by unsupervised learning using GAN. However, qualitative comparisons in plots of 3D pose suggest that the joint range of motion constraints introduced in this paper are effective in estimating 3D pose without anatomical failures. Particularly in the scenes with large flexion of the upper and lower limbs, our network can avoid anatomical failures in the estimated 3D pose while the depth estimation methods could not. In addition, the proposed network can adjust itself adaptively for cameras with unknown external parameters.
ISSN:1880-9863