LW-ViT: The Lightweight Vision Transformer Model Applied in Offline Handwritten Chinese Character Recognition

In recent years, the transformer model has been widely used in computer-vision tasks and has achieved impressive results. Unfortunately, these transformer-based models have the common drawback of having many parameters and a large memory footprint, causing them to be difficult to deploy on mobiles a...

Full description

Bibliographic Details
Main Authors:	Shiyong Geng, Zongnan Zhu, Zhida Wang, Yongping Dan, Hengyi Li
Format:	Article
Language:	English
Published:	MDPI AG 2023-04-01
Series:	Electronics
Subjects:	transformer-based models lightweight vision transformer (LW-ViT) offline handwritten Chinese character recognition MV2 layer
Online Access:	https://www.mdpi.com/2079-9292/12/7/1693

_version_	1797608096812498944
author	Shiyong Geng Zongnan Zhu Zhida Wang Yongping Dan Hengyi Li
author_facet	Shiyong Geng Zongnan Zhu Zhida Wang Yongping Dan Hengyi Li
author_sort	Shiyong Geng
collection	DOAJ
description	In recent years, the transformer model has been widely used in computer-vision tasks and has achieved impressive results. Unfortunately, these transformer-based models have the common drawback of having many parameters and a large memory footprint, causing them to be difficult to deploy on mobiles as lightweight convolutional neural networks. To address these issues, a Vision Transformer (ViT) model, named the lightweight Vision Transformer (LW-ViT) model, is proposed to reduce the complexity of the transformer-based model. The model is applied to offline handwritten Chinese character recognition. The design of the LW-ViT model is inspired by MobileViT. The lightweight ViT model reduces the number of parameters and FLOPs by reducing the number of transformer blocks and the MV2 layer based on the overall framework of the MobileViT model. The number of parameters and FLOPs for the LW-ViT model was 0.48 million and 0.22 G, respectively, and it ultimately achieved a high recognition accuracy of 95.8% on the dataset. Furthermore, compared to the MobileViT model, the number of parameters was reduced by 53.8%, and the FLOPs were reduced by 18.5%. The experimental results show that the LW-ViT model has a low number of parameters, proving the correctness and feasibility of the proposed model.
first_indexed	2024-03-11T05:38:51Z
format	Article
id	doaj.art-7303cf5fe63946be85106e2b033d7647
institution	Directory Open Access Journal
issn	2079-9292
language	English
last_indexed	2024-03-11T05:38:51Z
publishDate	2023-04-01
publisher	MDPI AG
record_format	Article
series	Electronics
spelling	doaj.art-7303cf5fe63946be85106e2b033d76472023-11-17T16:34:17ZengMDPI AGElectronics2079-92922023-04-01127169310.3390/electronics12071693LW-ViT: The Lightweight Vision Transformer Model Applied in Offline Handwritten Chinese Character RecognitionShiyong Geng0Zongnan Zhu1Zhida Wang2Yongping Dan3Hengyi Li4School of Electronic and Information, Zhongyuan University of Technology, Zhengzhou 451191, ChinaSchool of Electronic and Information, Zhongyuan University of Technology, Zhengzhou 451191, ChinaSchool of Electronic and Information, Zhongyuan University of Technology, Zhengzhou 451191, ChinaSchool of Electronic and Information, Zhongyuan University of Technology, Zhengzhou 451191, ChinaGraduate School of Science and Engineering, Ritsumeikan University, 1-1-1 Noji-higashi, Kusatsu 525-8577, JapanIn recent years, the transformer model has been widely used in computer-vision tasks and has achieved impressive results. Unfortunately, these transformer-based models have the common drawback of having many parameters and a large memory footprint, causing them to be difficult to deploy on mobiles as lightweight convolutional neural networks. To address these issues, a Vision Transformer (ViT) model, named the lightweight Vision Transformer (LW-ViT) model, is proposed to reduce the complexity of the transformer-based model. The model is applied to offline handwritten Chinese character recognition. The design of the LW-ViT model is inspired by MobileViT. The lightweight ViT model reduces the number of parameters and FLOPs by reducing the number of transformer blocks and the MV2 layer based on the overall framework of the MobileViT model. The number of parameters and FLOPs for the LW-ViT model was 0.48 million and 0.22 G, respectively, and it ultimately achieved a high recognition accuracy of 95.8% on the dataset. Furthermore, compared to the MobileViT model, the number of parameters was reduced by 53.8%, and the FLOPs were reduced by 18.5%. The experimental results show that the LW-ViT model has a low number of parameters, proving the correctness and feasibility of the proposed model.https://www.mdpi.com/2079-9292/12/7/1693transformer-based modelslightweight vision transformer (LW-ViT)offline handwritten Chinese character recognitionMV2 layer
spellingShingle	Shiyong Geng Zongnan Zhu Zhida Wang Yongping Dan Hengyi Li LW-ViT: The Lightweight Vision Transformer Model Applied in Offline Handwritten Chinese Character Recognition Electronics transformer-based models lightweight vision transformer (LW-ViT) offline handwritten Chinese character recognition MV2 layer
title	LW-ViT: The Lightweight Vision Transformer Model Applied in Offline Handwritten Chinese Character Recognition
title_full	LW-ViT: The Lightweight Vision Transformer Model Applied in Offline Handwritten Chinese Character Recognition
title_fullStr	LW-ViT: The Lightweight Vision Transformer Model Applied in Offline Handwritten Chinese Character Recognition
title_full_unstemmed	LW-ViT: The Lightweight Vision Transformer Model Applied in Offline Handwritten Chinese Character Recognition
title_short	LW-ViT: The Lightweight Vision Transformer Model Applied in Offline Handwritten Chinese Character Recognition
title_sort	lw vit the lightweight vision transformer model applied in offline handwritten chinese character recognition
topic	transformer-based models lightweight vision transformer (LW-ViT) offline handwritten Chinese character recognition MV2 layer
url	https://www.mdpi.com/2079-9292/12/7/1693
work_keys_str_mv	AT shiyonggeng lwvitthelightweightvisiontransformermodelappliedinofflinehandwrittenchinesecharacterrecognition AT zongnanzhu lwvitthelightweightvisiontransformermodelappliedinofflinehandwrittenchinesecharacterrecognition AT zhidawang lwvitthelightweightvisiontransformermodelappliedinofflinehandwrittenchinesecharacterrecognition AT yongpingdan lwvitthelightweightvisiontransformermodelappliedinofflinehandwrittenchinesecharacterrecognition AT hengyili lwvitthelightweightvisiontransformermodelappliedinofflinehandwrittenchinesecharacterrecognition

LW-ViT: The Lightweight Vision Transformer Model Applied in Offline Handwritten Chinese Character Recognition

Similar Items