Human Motion Prediction Based on a Multi-Scale Hypergraph for Intangible Cultural Heritage Dance Videos

Compared to traditional dance, intangible cultural heritage dance often involves the isotropic extension of choreographic actions, utilizing both upper and lower limbs. This characteristic choreography style makes the remote joints lack interaction, consequently reducing accuracy in existing human m...

Full description

Bibliographic Details
Main Authors: Xingquan Cai, Pengyan Cheng, Shike Liu, Haoyu Zhang, Haiyan Sun
Format: Article
Language:English
Published: MDPI AG 2023-11-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/12/23/4830
Description
Summary:Compared to traditional dance, intangible cultural heritage dance often involves the isotropic extension of choreographic actions, utilizing both upper and lower limbs. This characteristic choreography style makes the remote joints lack interaction, consequently reducing accuracy in existing human motion prediction methods. Therefore, we propose a human motion prediction method based on the multi-scale hypergraph convolutional network of the intangible cultural heritage dance video. Firstly, this method inputs the 3D human posture sequence from intangible cultural heritage dance videos. The hypergraph is designed according to the synergistic relationship of the human joints in the intangible cultural heritage dance video, which is used to represent the spatial correlation of the 3D human posture. Then, a multi-scale hypergraph convolutional network is constructed, utilizing multi-scale transformation operators to segment the human skeleton into different scales. This network adopts a graph structure to represent the 3D human posture at different scales, which is then used by the single-scalar fusion operator to spatial features in the 3D human posture sequence are extracted by fusing the feature information of the hypergraph and the multi-scale graph. Finally, the Temporal Graph Transformer network is introduced to capture the temporal dependence among adjacent frames within the time domain. This facilitates the extraction of temporal features from the 3D human posture sequence, ultimately enabling the prediction of future 3D human posture sequences. Experiments show that we achieve the best performance in both short-term and long-term human motion prediction when compared to Motion-Mixer and Motion-Attention algorithms on Human3.6M and 3DPW datasets. In addition, ablation experiments show that our method can predict more precise 3D human pose sequences, even in the presence of isotropic extensions of upper and lower limbs in intangible cultural heritage dance videos. This approach effectively addresses the issue of missing segments in intangible cultural heritage dance videos.
ISSN:2079-9292