LW-ViT: The Lightweight Vision Transformer Model Applied in Offline Handwritten Chinese Character Recognition

In recent years, the transformer model has been widely used in computer-vision tasks and has achieved impressive results. Unfortunately, these transformer-based models have the common drawback of having many parameters and a large memory footprint, causing them to be difficult to deploy on mobiles a...

Full description

Bibliographic Details
Main Authors: Shiyong Geng, Zongnan Zhu, Zhida Wang, Yongping Dan, Hengyi Li
Format: Article
Language:English
Published: MDPI AG 2023-04-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/12/7/1693
_version_ 1797608096812498944
author Shiyong Geng
Zongnan Zhu
Zhida Wang
Yongping Dan
Hengyi Li
author_facet Shiyong Geng
Zongnan Zhu
Zhida Wang
Yongping Dan
Hengyi Li
author_sort Shiyong Geng
collection DOAJ
description In recent years, the transformer model has been widely used in computer-vision tasks and has achieved impressive results. Unfortunately, these transformer-based models have the common drawback of having many parameters and a large memory footprint, causing them to be difficult to deploy on mobiles as lightweight convolutional neural networks. To address these issues, a Vision Transformer (ViT) model, named the lightweight Vision Transformer (LW-ViT) model, is proposed to reduce the complexity of the transformer-based model. The model is applied to offline handwritten Chinese character recognition. The design of the LW-ViT model is inspired by MobileViT. The lightweight ViT model reduces the number of parameters and FLOPs by reducing the number of transformer blocks and the MV2 layer based on the overall framework of the MobileViT model. The number of parameters and FLOPs for the LW-ViT model was 0.48 million and 0.22 G, respectively, and it ultimately achieved a high recognition accuracy of 95.8% on the dataset. Furthermore, compared to the MobileViT model, the number of parameters was reduced by 53.8%, and the FLOPs were reduced by 18.5%. The experimental results show that the LW-ViT model has a low number of parameters, proving the correctness and feasibility of the proposed model.
first_indexed 2024-03-11T05:38:51Z
format Article
id doaj.art-7303cf5fe63946be85106e2b033d7647
institution Directory Open Access Journal
issn 2079-9292
language English
last_indexed 2024-03-11T05:38:51Z
publishDate 2023-04-01
publisher MDPI AG
record_format Article
series Electronics
spelling doaj.art-7303cf5fe63946be85106e2b033d76472023-11-17T16:34:17ZengMDPI AGElectronics2079-92922023-04-01127169310.3390/electronics12071693LW-ViT: The Lightweight Vision Transformer Model Applied in Offline Handwritten Chinese Character RecognitionShiyong Geng0Zongnan Zhu1Zhida Wang2Yongping Dan3Hengyi Li4School of Electronic and Information, Zhongyuan University of Technology, Zhengzhou 451191, ChinaSchool of Electronic and Information, Zhongyuan University of Technology, Zhengzhou 451191, ChinaSchool of Electronic and Information, Zhongyuan University of Technology, Zhengzhou 451191, ChinaSchool of Electronic and Information, Zhongyuan University of Technology, Zhengzhou 451191, ChinaGraduate School of Science and Engineering, Ritsumeikan University, 1-1-1 Noji-higashi, Kusatsu 525-8577, JapanIn recent years, the transformer model has been widely used in computer-vision tasks and has achieved impressive results. Unfortunately, these transformer-based models have the common drawback of having many parameters and a large memory footprint, causing them to be difficult to deploy on mobiles as lightweight convolutional neural networks. To address these issues, a Vision Transformer (ViT) model, named the lightweight Vision Transformer (LW-ViT) model, is proposed to reduce the complexity of the transformer-based model. The model is applied to offline handwritten Chinese character recognition. The design of the LW-ViT model is inspired by MobileViT. The lightweight ViT model reduces the number of parameters and FLOPs by reducing the number of transformer blocks and the MV2 layer based on the overall framework of the MobileViT model. The number of parameters and FLOPs for the LW-ViT model was 0.48 million and 0.22 G, respectively, and it ultimately achieved a high recognition accuracy of 95.8% on the dataset. Furthermore, compared to the MobileViT model, the number of parameters was reduced by 53.8%, and the FLOPs were reduced by 18.5%. The experimental results show that the LW-ViT model has a low number of parameters, proving the correctness and feasibility of the proposed model.https://www.mdpi.com/2079-9292/12/7/1693transformer-based modelslightweight vision transformer (LW-ViT)offline handwritten Chinese character recognitionMV2 layer
spellingShingle Shiyong Geng
Zongnan Zhu
Zhida Wang
Yongping Dan
Hengyi Li
LW-ViT: The Lightweight Vision Transformer Model Applied in Offline Handwritten Chinese Character Recognition
Electronics
transformer-based models
lightweight vision transformer (LW-ViT)
offline handwritten Chinese character recognition
MV2 layer
title LW-ViT: The Lightweight Vision Transformer Model Applied in Offline Handwritten Chinese Character Recognition
title_full LW-ViT: The Lightweight Vision Transformer Model Applied in Offline Handwritten Chinese Character Recognition
title_fullStr LW-ViT: The Lightweight Vision Transformer Model Applied in Offline Handwritten Chinese Character Recognition
title_full_unstemmed LW-ViT: The Lightweight Vision Transformer Model Applied in Offline Handwritten Chinese Character Recognition
title_short LW-ViT: The Lightweight Vision Transformer Model Applied in Offline Handwritten Chinese Character Recognition
title_sort lw vit the lightweight vision transformer model applied in offline handwritten chinese character recognition
topic transformer-based models
lightweight vision transformer (LW-ViT)
offline handwritten Chinese character recognition
MV2 layer
url https://www.mdpi.com/2079-9292/12/7/1693
work_keys_str_mv AT shiyonggeng lwvitthelightweightvisiontransformermodelappliedinofflinehandwrittenchinesecharacterrecognition
AT zongnanzhu lwvitthelightweightvisiontransformermodelappliedinofflinehandwrittenchinesecharacterrecognition
AT zhidawang lwvitthelightweightvisiontransformermodelappliedinofflinehandwrittenchinesecharacterrecognition
AT yongpingdan lwvitthelightweightvisiontransformermodelappliedinofflinehandwrittenchinesecharacterrecognition
AT hengyili lwvitthelightweightvisiontransformermodelappliedinofflinehandwrittenchinesecharacterrecognition