CNN-optimized text recognition with binary embeddings for Arabic expiry date recognition

Abstract Recognizing Arabic dot-matrix digits is a challenging problem due to the unique characteristics of dot-matrix fonts, such as irregular dot spacing and varying dot sizes. This paper presents an approach for recognizing Arabic digits printed in dot matrix format. The proposed model is based o...

Full description

Bibliographic Details
Main Authors: Mohamed Lotfy, Ghada Soliman
Format: Article
Language:English
Published: SpringerOpen 2024-02-01
Series:Journal of Electrical Systems and Information Technology
Online Access:https://doi.org/10.1186/s43067-024-00136-2
_version_ 1827328256516292608
author Mohamed Lotfy
Ghada Soliman
author_facet Mohamed Lotfy
Ghada Soliman
author_sort Mohamed Lotfy
collection DOAJ
description Abstract Recognizing Arabic dot-matrix digits is a challenging problem due to the unique characteristics of dot-matrix fonts, such as irregular dot spacing and varying dot sizes. This paper presents an approach for recognizing Arabic digits printed in dot matrix format. The proposed model is based on convolutional neural networks (CNN) that take the dot matrix as input and generate embeddings that are rounded to generate binary representations of the digits. The binary embeddings are then used to perform Optical Character Recognition (OCR) on the date images. To overcome the challenge of the limited availability of dotted Arabic expiration date images, we developed a True Type Font (TTF) for generating synthetic images of Arabic dot-matrix characters. The model was trained on a synthetic dataset of 3287 images and 658 synthetic images for testing, representing realistic expiration dates from 2019 to 2027 in the format of yyyy/mm/dd and yy/mm/dd. Our model achieved an accuracy of 98.94% on the expiry date recognition with Arabic dot matrix format using fewer parameters and less computational resources than traditional CNN-based models. By investigating and presenting our findings comprehensively, we aim to contribute substantially to the field of OCR and pave the way for advancements in Arabic dot-matrix character recognition. Our proposed approach is not limited to Arabic dot matrix digit recognition but can be also extended to text recognition tasks, such as text classification and sentiment analysis.
first_indexed 2024-03-07T15:15:05Z
format Article
id doaj.art-7bc11b12c7b24a20b69b3ea0bf2ef86b
institution Directory Open Access Journal
issn 2314-7172
language English
last_indexed 2024-03-07T15:15:05Z
publishDate 2024-02-01
publisher SpringerOpen
record_format Article
series Journal of Electrical Systems and Information Technology
spelling doaj.art-7bc11b12c7b24a20b69b3ea0bf2ef86b2024-03-05T17:58:07ZengSpringerOpenJournal of Electrical Systems and Information Technology2314-71722024-02-0111111810.1186/s43067-024-00136-2CNN-optimized text recognition with binary embeddings for Arabic expiry date recognitionMohamed Lotfy0Ghada Soliman1Department Software Engineering, Kafr El-Sheikh UniversityPhD, Department Environmental Engineering, Ain Shams UniversityAbstract Recognizing Arabic dot-matrix digits is a challenging problem due to the unique characteristics of dot-matrix fonts, such as irregular dot spacing and varying dot sizes. This paper presents an approach for recognizing Arabic digits printed in dot matrix format. The proposed model is based on convolutional neural networks (CNN) that take the dot matrix as input and generate embeddings that are rounded to generate binary representations of the digits. The binary embeddings are then used to perform Optical Character Recognition (OCR) on the date images. To overcome the challenge of the limited availability of dotted Arabic expiration date images, we developed a True Type Font (TTF) for generating synthetic images of Arabic dot-matrix characters. The model was trained on a synthetic dataset of 3287 images and 658 synthetic images for testing, representing realistic expiration dates from 2019 to 2027 in the format of yyyy/mm/dd and yy/mm/dd. Our model achieved an accuracy of 98.94% on the expiry date recognition with Arabic dot matrix format using fewer parameters and less computational resources than traditional CNN-based models. By investigating and presenting our findings comprehensively, we aim to contribute substantially to the field of OCR and pave the way for advancements in Arabic dot-matrix character recognition. Our proposed approach is not limited to Arabic dot matrix digit recognition but can be also extended to text recognition tasks, such as text classification and sentiment analysis.https://doi.org/10.1186/s43067-024-00136-2
spellingShingle Mohamed Lotfy
Ghada Soliman
CNN-optimized text recognition with binary embeddings for Arabic expiry date recognition
Journal of Electrical Systems and Information Technology
title CNN-optimized text recognition with binary embeddings for Arabic expiry date recognition
title_full CNN-optimized text recognition with binary embeddings for Arabic expiry date recognition
title_fullStr CNN-optimized text recognition with binary embeddings for Arabic expiry date recognition
title_full_unstemmed CNN-optimized text recognition with binary embeddings for Arabic expiry date recognition
title_short CNN-optimized text recognition with binary embeddings for Arabic expiry date recognition
title_sort cnn optimized text recognition with binary embeddings for arabic expiry date recognition
url https://doi.org/10.1186/s43067-024-00136-2
work_keys_str_mv AT mohamedlotfy cnnoptimizedtextrecognitionwithbinaryembeddingsforarabicexpirydaterecognition
AT ghadasoliman cnnoptimizedtextrecognitionwithbinaryembeddingsforarabicexpirydaterecognition