Summary: | Abstract Facial beauty analysis is an important topic in human society. It may be used as a guidance for face beautification applications such as cosmetic surgery. Deep neural networks (DNNs) have recently been adopted for facial beauty analysis and have achieved remarkable performance. However, most existing DNN‐based models regard facial beauty analysis as a normal classification task. They ignore important prior knowledge in traditional machine learning models which illustrate the significant contribution of the geometric features in facial beauty analysis. To be specific, landmarks of the whole face and facial organs are introduced to extract geometric features to make the decision. Inspired by this, we introduce a novel dual‐branch network for facial beauty analysis: one branch takes the Swin Transformer as the backbone to model the full face and global patterns, and another branch focuses on the masked facial organs with the residual network to model the local patterns of certain facial parts. Additionally, the designed multi‐scale feature fusion module can further facilitate our network to learn complementary semantic information between the two branches. In model optimisation, we propose a hybrid loss function, where especially geometric regulation is introduced by regressing the facial landmarks and it can force the extracted features to convey facial geometric features. Experiments performed on the SCUT‐FBP5500 dataset and the SCUT‐FBP dataset demonstrate that our model outperforms the state‐of‐the‐art convolutional neural networks models, which proves the effectiveness of the proposed geometric regularisation and dual‐branch structure with the hybrid network. To the best of our knowledge, this is the first study to introduce a Vision Transformer into the facial beauty analysis task.
|