Face-based age estimation using improved Swin Transformer with attention-based convolution

Recently Transformer models is new direction in the computer vision field, which is based on self multihead attention mechanism. Compared with the convolutional neural network, this Transformer uses the self-attention mechanism to capture global contextual information and extract more strong feature...

Full description

Bibliographic Details
Main Authors: Chaojun Shi, Shiwei Zhao, Ke Zhang, Yibo Wang, Longping Liang
Format: Article
Language:English
Published: Frontiers Media S.A. 2023-04-01
Series:Frontiers in Neuroscience
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fnins.2023.1136934/full
_version_ 1797848201964814336
author Chaojun Shi
Chaojun Shi
Shiwei Zhao
Ke Zhang
Ke Zhang
Yibo Wang
Longping Liang
author_facet Chaojun Shi
Chaojun Shi
Shiwei Zhao
Ke Zhang
Ke Zhang
Yibo Wang
Longping Liang
author_sort Chaojun Shi
collection DOAJ
description Recently Transformer models is new direction in the computer vision field, which is based on self multihead attention mechanism. Compared with the convolutional neural network, this Transformer uses the self-attention mechanism to capture global contextual information and extract more strong features by learning the association relationship between different features, which has achieved good results in many vision tasks. In face-based age estimation, some facial patches that contain rich age-specific information are critical in the age estimation task. The present study proposed an attention-based convolution (ABC) age estimation framework, called improved Swin Transformer with ABC, in which two separate regions were implemented, namely ABC and Swin Transformer. ABC extracted facial patches containing rich age-specific information using a shallow convolutional network and a multiheaded attention mechanism. Subsequently, the features obtained by ABC were spliced with the flattened image in the Swin Transformer, which were then input to the Swin Transformer to predict the age of the image. The ABC framework spliced the important regions that contained rich age-specific information into the original image, which could fully mobilize the long-dependency of the Swin Transformer, that is, extracting stronger features by learning the dependency relationship between different features. ABC also introduced loss of diversity to guide the training of self-attention mechanism, reducing overlap between patches so that the diverse and important patches were discovered. Through extensive experiments, this study showed that the proposed framework outperformed several state-of-the-art methods on age estimation benchmark datasets.
first_indexed 2024-04-09T18:24:42Z
format Article
id doaj.art-c4f25cb4203e40d9af40615e195f23a8
institution Directory Open Access Journal
issn 1662-453X
language English
last_indexed 2024-04-09T18:24:42Z
publishDate 2023-04-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Neuroscience
spelling doaj.art-c4f25cb4203e40d9af40615e195f23a82023-04-12T05:00:58ZengFrontiers Media S.A.Frontiers in Neuroscience1662-453X2023-04-011710.3389/fnins.2023.11369341136934Face-based age estimation using improved Swin Transformer with attention-based convolutionChaojun Shi0Chaojun Shi1Shiwei Zhao2Ke Zhang3Ke Zhang4Yibo Wang5Longping Liang6Department of Electronic and Communication Engineering, North China Electric Power University, Baoding, Hebei, ChinaHebei Key Laboratory of Power Internet of Things Technology, North China Electric Power University, Baoding, Hebei, ChinaDepartment of Electronic and Communication Engineering, North China Electric Power University, Baoding, Hebei, ChinaDepartment of Electronic and Communication Engineering, North China Electric Power University, Baoding, Hebei, ChinaHebei Key Laboratory of Power Internet of Things Technology, North China Electric Power University, Baoding, Hebei, ChinaDepartment of Electronic and Communication Engineering, North China Electric Power University, Baoding, Hebei, ChinaDepartment of Electronic and Communication Engineering, North China Electric Power University, Baoding, Hebei, ChinaRecently Transformer models is new direction in the computer vision field, which is based on self multihead attention mechanism. Compared with the convolutional neural network, this Transformer uses the self-attention mechanism to capture global contextual information and extract more strong features by learning the association relationship between different features, which has achieved good results in many vision tasks. In face-based age estimation, some facial patches that contain rich age-specific information are critical in the age estimation task. The present study proposed an attention-based convolution (ABC) age estimation framework, called improved Swin Transformer with ABC, in which two separate regions were implemented, namely ABC and Swin Transformer. ABC extracted facial patches containing rich age-specific information using a shallow convolutional network and a multiheaded attention mechanism. Subsequently, the features obtained by ABC were spliced with the flattened image in the Swin Transformer, which were then input to the Swin Transformer to predict the age of the image. The ABC framework spliced the important regions that contained rich age-specific information into the original image, which could fully mobilize the long-dependency of the Swin Transformer, that is, extracting stronger features by learning the dependency relationship between different features. ABC also introduced loss of diversity to guide the training of self-attention mechanism, reducing overlap between patches so that the diverse and important patches were discovered. Through extensive experiments, this study showed that the proposed framework outperformed several state-of-the-art methods on age estimation benchmark datasets.https://www.frontiersin.org/articles/10.3389/fnins.2023.1136934/fullage estimationSwin Transformerattention mechanismdeep learningneural networks
spellingShingle Chaojun Shi
Chaojun Shi
Shiwei Zhao
Ke Zhang
Ke Zhang
Yibo Wang
Longping Liang
Face-based age estimation using improved Swin Transformer with attention-based convolution
Frontiers in Neuroscience
age estimation
Swin Transformer
attention mechanism
deep learning
neural networks
title Face-based age estimation using improved Swin Transformer with attention-based convolution
title_full Face-based age estimation using improved Swin Transformer with attention-based convolution
title_fullStr Face-based age estimation using improved Swin Transformer with attention-based convolution
title_full_unstemmed Face-based age estimation using improved Swin Transformer with attention-based convolution
title_short Face-based age estimation using improved Swin Transformer with attention-based convolution
title_sort face based age estimation using improved swin transformer with attention based convolution
topic age estimation
Swin Transformer
attention mechanism
deep learning
neural networks
url https://www.frontiersin.org/articles/10.3389/fnins.2023.1136934/full
work_keys_str_mv AT chaojunshi facebasedageestimationusingimprovedswintransformerwithattentionbasedconvolution
AT chaojunshi facebasedageestimationusingimprovedswintransformerwithattentionbasedconvolution
AT shiweizhao facebasedageestimationusingimprovedswintransformerwithattentionbasedconvolution
AT kezhang facebasedageestimationusingimprovedswintransformerwithattentionbasedconvolution
AT kezhang facebasedageestimationusingimprovedswintransformerwithattentionbasedconvolution
AT yibowang facebasedageestimationusingimprovedswintransformerwithattentionbasedconvolution
AT longpingliang facebasedageestimationusingimprovedswintransformerwithattentionbasedconvolution