Summary: | In recent years, skeleton-based action recognition has received extensive attention, and a large number of researches have achieved excellent performance. This paper investigates on unsupervised domain adaptation (UDA) method (STT-DA) used in skeleton-based action recognition tasks, which is challenging in real scenes. In domain adaptation tasks, the labels are only available on source domain but unavailable on target domain. Different from other traditional approaches for UDA like the adversarial learning-based methods, this paper adapts a Transformer mechanism based on cross-attention to align different domains. It learns from both source and target domain to reduce the domain shift between different skeleton datasets, thus reducing the effect of pseudo-labels errors which is generated in domain adaptation process. Taking the particularity of skeleton data into account, this paper proposes bidirectional normalized alignment algorithm to align skeleton sequences from source and target domain and explore the feature representation in both spatial and temporal dimensions. Especially, it focuses on the adjacency dependency of skeleton joints, that is, each joint is a weight summary of adjacent joints. It enables the network to pay attention to the global characteristics of skeleton data and consider the local characteristics of joint connections. Meanwhile, skeleton sequences are divided into several parts, called <inline-formula> <tex-math notation="LaTeX">$subs$ </tex-math></inline-formula>, to reduce the time cost of the model. We conduct experiments on five datasets for skeleton-based action recognition, including two large-scale datasets (NTU RGB+D, NW-UCLA). Extensive results demonstrate that the accuracy of the proposed method (STT-DA) reaches 82.5% on NTU <inline-formula> <tex-math notation="LaTeX">$\rightarrow $ </tex-math></inline-formula> UCLA domain adaptation task and it also performs better on other datasets. With the application of skeleton sequences alignment algorithm and local attention weights, the accuracy improves largely.
|