Summary: | Facial pose variation presents a significant challenge to facial expression recognition (FER) in real‐world applications. Significant bottlenecks exist in the field of multiview facial expression recognition (MFER) including a lack of high‐quality MFER datasets, and the limited model robustness in real‐world MFER scenarios. Therefore, this article first introduces a metahuman‐based MFER dataset (MMED), which effectively addresses the issues of insufficient quantity and quality in existing datasets. Second, a conditional cascade VGG (ccVGG) model is proposed, which can adaptively adjust expression feature extraction based on the input image's pose information. Finally, a hybrid training and few‐shot learning strategy are proposed that integrates our MMED dataset with a real‐world dataset and quickly deploys it in real‐world application scenarios using the proposed Meta‐Dist few‐shot learning method. Experiments on the Karolinska Directed Emotional Face (KDEF) dataset demonstrate that the proposed model exhibits improved robustness in multiview application scenarios and achieves a recognition accuracy improvement of 28.68% relative to the baseline. It demonstrates that the proposed MMED dataset can effectively improve the training efficiency of MFER models and facilitate easy deployment in real‐world applications. This work provides a reliable dataset for the MFER studies and paves the way for robust FER in any view for real‐world deployment.
|