Face-based age estimation using improved Swin Transformer with attention-based convolution
Recently Transformer models is new direction in the computer vision field, which is based on self multihead attention mechanism. Compared with the convolutional neural network, this Transformer uses the self-attention mechanism to capture global contextual information and extract more strong feature...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2023-04-01
|
Series: | Frontiers in Neuroscience |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fnins.2023.1136934/full |
_version_ | 1797848201964814336 |
---|---|
author | Chaojun Shi Chaojun Shi Shiwei Zhao Ke Zhang Ke Zhang Yibo Wang Longping Liang |
author_facet | Chaojun Shi Chaojun Shi Shiwei Zhao Ke Zhang Ke Zhang Yibo Wang Longping Liang |
author_sort | Chaojun Shi |
collection | DOAJ |
description | Recently Transformer models is new direction in the computer vision field, which is based on self multihead attention mechanism. Compared with the convolutional neural network, this Transformer uses the self-attention mechanism to capture global contextual information and extract more strong features by learning the association relationship between different features, which has achieved good results in many vision tasks. In face-based age estimation, some facial patches that contain rich age-specific information are critical in the age estimation task. The present study proposed an attention-based convolution (ABC) age estimation framework, called improved Swin Transformer with ABC, in which two separate regions were implemented, namely ABC and Swin Transformer. ABC extracted facial patches containing rich age-specific information using a shallow convolutional network and a multiheaded attention mechanism. Subsequently, the features obtained by ABC were spliced with the flattened image in the Swin Transformer, which were then input to the Swin Transformer to predict the age of the image. The ABC framework spliced the important regions that contained rich age-specific information into the original image, which could fully mobilize the long-dependency of the Swin Transformer, that is, extracting stronger features by learning the dependency relationship between different features. ABC also introduced loss of diversity to guide the training of self-attention mechanism, reducing overlap between patches so that the diverse and important patches were discovered. Through extensive experiments, this study showed that the proposed framework outperformed several state-of-the-art methods on age estimation benchmark datasets. |
first_indexed | 2024-04-09T18:24:42Z |
format | Article |
id | doaj.art-c4f25cb4203e40d9af40615e195f23a8 |
institution | Directory Open Access Journal |
issn | 1662-453X |
language | English |
last_indexed | 2024-04-09T18:24:42Z |
publishDate | 2023-04-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Neuroscience |
spelling | doaj.art-c4f25cb4203e40d9af40615e195f23a82023-04-12T05:00:58ZengFrontiers Media S.A.Frontiers in Neuroscience1662-453X2023-04-011710.3389/fnins.2023.11369341136934Face-based age estimation using improved Swin Transformer with attention-based convolutionChaojun Shi0Chaojun Shi1Shiwei Zhao2Ke Zhang3Ke Zhang4Yibo Wang5Longping Liang6Department of Electronic and Communication Engineering, North China Electric Power University, Baoding, Hebei, ChinaHebei Key Laboratory of Power Internet of Things Technology, North China Electric Power University, Baoding, Hebei, ChinaDepartment of Electronic and Communication Engineering, North China Electric Power University, Baoding, Hebei, ChinaDepartment of Electronic and Communication Engineering, North China Electric Power University, Baoding, Hebei, ChinaHebei Key Laboratory of Power Internet of Things Technology, North China Electric Power University, Baoding, Hebei, ChinaDepartment of Electronic and Communication Engineering, North China Electric Power University, Baoding, Hebei, ChinaDepartment of Electronic and Communication Engineering, North China Electric Power University, Baoding, Hebei, ChinaRecently Transformer models is new direction in the computer vision field, which is based on self multihead attention mechanism. Compared with the convolutional neural network, this Transformer uses the self-attention mechanism to capture global contextual information and extract more strong features by learning the association relationship between different features, which has achieved good results in many vision tasks. In face-based age estimation, some facial patches that contain rich age-specific information are critical in the age estimation task. The present study proposed an attention-based convolution (ABC) age estimation framework, called improved Swin Transformer with ABC, in which two separate regions were implemented, namely ABC and Swin Transformer. ABC extracted facial patches containing rich age-specific information using a shallow convolutional network and a multiheaded attention mechanism. Subsequently, the features obtained by ABC were spliced with the flattened image in the Swin Transformer, which were then input to the Swin Transformer to predict the age of the image. The ABC framework spliced the important regions that contained rich age-specific information into the original image, which could fully mobilize the long-dependency of the Swin Transformer, that is, extracting stronger features by learning the dependency relationship between different features. ABC also introduced loss of diversity to guide the training of self-attention mechanism, reducing overlap between patches so that the diverse and important patches were discovered. Through extensive experiments, this study showed that the proposed framework outperformed several state-of-the-art methods on age estimation benchmark datasets.https://www.frontiersin.org/articles/10.3389/fnins.2023.1136934/fullage estimationSwin Transformerattention mechanismdeep learningneural networks |
spellingShingle | Chaojun Shi Chaojun Shi Shiwei Zhao Ke Zhang Ke Zhang Yibo Wang Longping Liang Face-based age estimation using improved Swin Transformer with attention-based convolution Frontiers in Neuroscience age estimation Swin Transformer attention mechanism deep learning neural networks |
title | Face-based age estimation using improved Swin Transformer with attention-based convolution |
title_full | Face-based age estimation using improved Swin Transformer with attention-based convolution |
title_fullStr | Face-based age estimation using improved Swin Transformer with attention-based convolution |
title_full_unstemmed | Face-based age estimation using improved Swin Transformer with attention-based convolution |
title_short | Face-based age estimation using improved Swin Transformer with attention-based convolution |
title_sort | face based age estimation using improved swin transformer with attention based convolution |
topic | age estimation Swin Transformer attention mechanism deep learning neural networks |
url | https://www.frontiersin.org/articles/10.3389/fnins.2023.1136934/full |
work_keys_str_mv | AT chaojunshi facebasedageestimationusingimprovedswintransformerwithattentionbasedconvolution AT chaojunshi facebasedageestimationusingimprovedswintransformerwithattentionbasedconvolution AT shiweizhao facebasedageestimationusingimprovedswintransformerwithattentionbasedconvolution AT kezhang facebasedageestimationusingimprovedswintransformerwithattentionbasedconvolution AT kezhang facebasedageestimationusingimprovedswintransformerwithattentionbasedconvolution AT yibowang facebasedageestimationusingimprovedswintransformerwithattentionbasedconvolution AT longpingliang facebasedageestimationusingimprovedswintransformerwithattentionbasedconvolution |