Face-based age estimation using improved Swin Transformer with attention-based convolution

Recently Transformer models is new direction in the computer vision field, which is based on self multihead attention mechanism. Compared with the convolutional neural network, this Transformer uses the self-attention mechanism to capture global contextual information and extract more strong feature...

Full description

Bibliographic Details
Main Authors:	Chaojun Shi, Shiwei Zhao, Ke Zhang, Yibo Wang, Longping Liang
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2023-04-01
Series:	Frontiers in Neuroscience
Subjects:	age estimation Swin Transformer attention mechanism deep learning neural networks
Online Access:	https://www.frontiersin.org/articles/10.3389/fnins.2023.1136934/full

_version_	1797848201964814336
author	Chaojun Shi Chaojun Shi Shiwei Zhao Ke Zhang Ke Zhang Yibo Wang Longping Liang
author_facet	Chaojun Shi Chaojun Shi Shiwei Zhao Ke Zhang Ke Zhang Yibo Wang Longping Liang
author_sort	Chaojun Shi
collection	DOAJ
description	Recently Transformer models is new direction in the computer vision field, which is based on self multihead attention mechanism. Compared with the convolutional neural network, this Transformer uses the self-attention mechanism to capture global contextual information and extract more strong features by learning the association relationship between different features, which has achieved good results in many vision tasks. In face-based age estimation, some facial patches that contain rich age-specific information are critical in the age estimation task. The present study proposed an attention-based convolution (ABC) age estimation framework, called improved Swin Transformer with ABC, in which two separate regions were implemented, namely ABC and Swin Transformer. ABC extracted facial patches containing rich age-specific information using a shallow convolutional network and a multiheaded attention mechanism. Subsequently, the features obtained by ABC were spliced with the flattened image in the Swin Transformer, which were then input to the Swin Transformer to predict the age of the image. The ABC framework spliced the important regions that contained rich age-specific information into the original image, which could fully mobilize the long-dependency of the Swin Transformer, that is, extracting stronger features by learning the dependency relationship between different features. ABC also introduced loss of diversity to guide the training of self-attention mechanism, reducing overlap between patches so that the diverse and important patches were discovered. Through extensive experiments, this study showed that the proposed framework outperformed several state-of-the-art methods on age estimation benchmark datasets.
first_indexed	2024-04-09T18:24:42Z
format	Article
id	doaj.art-c4f25cb4203e40d9af40615e195f23a8
institution	Directory Open Access Journal
issn	1662-453X
language	English
last_indexed	2024-04-09T18:24:42Z
publishDate	2023-04-01
publisher	Frontiers Media S.A.
record_format	Article
series	Frontiers in Neuroscience
spelling	doaj.art-c4f25cb4203e40d9af40615e195f23a82023-04-12T05:00:58ZengFrontiers Media S.A.Frontiers in Neuroscience1662-453X2023-04-011710.3389/fnins.2023.11369341136934Face-based age estimation using improved Swin Transformer with attention-based convolutionChaojun Shi0Chaojun Shi1Shiwei Zhao2Ke Zhang3Ke Zhang4Yibo Wang5Longping Liang6Department of Electronic and Communication Engineering, North China Electric Power University, Baoding, Hebei, ChinaHebei Key Laboratory of Power Internet of Things Technology, North China Electric Power University, Baoding, Hebei, ChinaDepartment of Electronic and Communication Engineering, North China Electric Power University, Baoding, Hebei, ChinaDepartment of Electronic and Communication Engineering, North China Electric Power University, Baoding, Hebei, ChinaHebei Key Laboratory of Power Internet of Things Technology, North China Electric Power University, Baoding, Hebei, ChinaDepartment of Electronic and Communication Engineering, North China Electric Power University, Baoding, Hebei, ChinaDepartment of Electronic and Communication Engineering, North China Electric Power University, Baoding, Hebei, ChinaRecently Transformer models is new direction in the computer vision field, which is based on self multihead attention mechanism. Compared with the convolutional neural network, this Transformer uses the self-attention mechanism to capture global contextual information and extract more strong features by learning the association relationship between different features, which has achieved good results in many vision tasks. In face-based age estimation, some facial patches that contain rich age-specific information are critical in the age estimation task. The present study proposed an attention-based convolution (ABC) age estimation framework, called improved Swin Transformer with ABC, in which two separate regions were implemented, namely ABC and Swin Transformer. ABC extracted facial patches containing rich age-specific information using a shallow convolutional network and a multiheaded attention mechanism. Subsequently, the features obtained by ABC were spliced with the flattened image in the Swin Transformer, which were then input to the Swin Transformer to predict the age of the image. The ABC framework spliced the important regions that contained rich age-specific information into the original image, which could fully mobilize the long-dependency of the Swin Transformer, that is, extracting stronger features by learning the dependency relationship between different features. ABC also introduced loss of diversity to guide the training of self-attention mechanism, reducing overlap between patches so that the diverse and important patches were discovered. Through extensive experiments, this study showed that the proposed framework outperformed several state-of-the-art methods on age estimation benchmark datasets.https://www.frontiersin.org/articles/10.3389/fnins.2023.1136934/fullage estimationSwin Transformerattention mechanismdeep learningneural networks
spellingShingle	Chaojun Shi Chaojun Shi Shiwei Zhao Ke Zhang Ke Zhang Yibo Wang Longping Liang Face-based age estimation using improved Swin Transformer with attention-based convolution Frontiers in Neuroscience age estimation Swin Transformer attention mechanism deep learning neural networks
title	Face-based age estimation using improved Swin Transformer with attention-based convolution
title_full	Face-based age estimation using improved Swin Transformer with attention-based convolution
title_fullStr	Face-based age estimation using improved Swin Transformer with attention-based convolution
title_full_unstemmed	Face-based age estimation using improved Swin Transformer with attention-based convolution
title_short	Face-based age estimation using improved Swin Transformer with attention-based convolution
title_sort	face based age estimation using improved swin transformer with attention based convolution
topic	age estimation Swin Transformer attention mechanism deep learning neural networks
url	https://www.frontiersin.org/articles/10.3389/fnins.2023.1136934/full
work_keys_str_mv	AT chaojunshi facebasedageestimationusingimprovedswintransformerwithattentionbasedconvolution AT chaojunshi facebasedageestimationusingimprovedswintransformerwithattentionbasedconvolution AT shiweizhao facebasedageestimationusingimprovedswintransformerwithattentionbasedconvolution AT kezhang facebasedageestimationusingimprovedswintransformerwithattentionbasedconvolution AT kezhang facebasedageestimationusingimprovedswintransformerwithattentionbasedconvolution AT yibowang facebasedageestimationusingimprovedswintransformerwithattentionbasedconvolution AT longpingliang facebasedageestimationusingimprovedswintransformerwithattentionbasedconvolution

Face-based age estimation using improved Swin Transformer with attention-based convolution

Similar Items