CNN-optimized text recognition with binary embeddings for Arabic expiry date recognition
Abstract Recognizing Arabic dot-matrix digits is a challenging problem due to the unique characteristics of dot-matrix fonts, such as irregular dot spacing and varying dot sizes. This paper presents an approach for recognizing Arabic digits printed in dot matrix format. The proposed model is based o...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
SpringerOpen
2024-02-01
|
Series: | Journal of Electrical Systems and Information Technology |
Online Access: | https://doi.org/10.1186/s43067-024-00136-2 |
_version_ | 1827328256516292608 |
---|---|
author | Mohamed Lotfy Ghada Soliman |
author_facet | Mohamed Lotfy Ghada Soliman |
author_sort | Mohamed Lotfy |
collection | DOAJ |
description | Abstract Recognizing Arabic dot-matrix digits is a challenging problem due to the unique characteristics of dot-matrix fonts, such as irregular dot spacing and varying dot sizes. This paper presents an approach for recognizing Arabic digits printed in dot matrix format. The proposed model is based on convolutional neural networks (CNN) that take the dot matrix as input and generate embeddings that are rounded to generate binary representations of the digits. The binary embeddings are then used to perform Optical Character Recognition (OCR) on the date images. To overcome the challenge of the limited availability of dotted Arabic expiration date images, we developed a True Type Font (TTF) for generating synthetic images of Arabic dot-matrix characters. The model was trained on a synthetic dataset of 3287 images and 658 synthetic images for testing, representing realistic expiration dates from 2019 to 2027 in the format of yyyy/mm/dd and yy/mm/dd. Our model achieved an accuracy of 98.94% on the expiry date recognition with Arabic dot matrix format using fewer parameters and less computational resources than traditional CNN-based models. By investigating and presenting our findings comprehensively, we aim to contribute substantially to the field of OCR and pave the way for advancements in Arabic dot-matrix character recognition. Our proposed approach is not limited to Arabic dot matrix digit recognition but can be also extended to text recognition tasks, such as text classification and sentiment analysis. |
first_indexed | 2024-03-07T15:15:05Z |
format | Article |
id | doaj.art-7bc11b12c7b24a20b69b3ea0bf2ef86b |
institution | Directory Open Access Journal |
issn | 2314-7172 |
language | English |
last_indexed | 2024-03-07T15:15:05Z |
publishDate | 2024-02-01 |
publisher | SpringerOpen |
record_format | Article |
series | Journal of Electrical Systems and Information Technology |
spelling | doaj.art-7bc11b12c7b24a20b69b3ea0bf2ef86b2024-03-05T17:58:07ZengSpringerOpenJournal of Electrical Systems and Information Technology2314-71722024-02-0111111810.1186/s43067-024-00136-2CNN-optimized text recognition with binary embeddings for Arabic expiry date recognitionMohamed Lotfy0Ghada Soliman1Department Software Engineering, Kafr El-Sheikh UniversityPhD, Department Environmental Engineering, Ain Shams UniversityAbstract Recognizing Arabic dot-matrix digits is a challenging problem due to the unique characteristics of dot-matrix fonts, such as irregular dot spacing and varying dot sizes. This paper presents an approach for recognizing Arabic digits printed in dot matrix format. The proposed model is based on convolutional neural networks (CNN) that take the dot matrix as input and generate embeddings that are rounded to generate binary representations of the digits. The binary embeddings are then used to perform Optical Character Recognition (OCR) on the date images. To overcome the challenge of the limited availability of dotted Arabic expiration date images, we developed a True Type Font (TTF) for generating synthetic images of Arabic dot-matrix characters. The model was trained on a synthetic dataset of 3287 images and 658 synthetic images for testing, representing realistic expiration dates from 2019 to 2027 in the format of yyyy/mm/dd and yy/mm/dd. Our model achieved an accuracy of 98.94% on the expiry date recognition with Arabic dot matrix format using fewer parameters and less computational resources than traditional CNN-based models. By investigating and presenting our findings comprehensively, we aim to contribute substantially to the field of OCR and pave the way for advancements in Arabic dot-matrix character recognition. Our proposed approach is not limited to Arabic dot matrix digit recognition but can be also extended to text recognition tasks, such as text classification and sentiment analysis.https://doi.org/10.1186/s43067-024-00136-2 |
spellingShingle | Mohamed Lotfy Ghada Soliman CNN-optimized text recognition with binary embeddings for Arabic expiry date recognition Journal of Electrical Systems and Information Technology |
title | CNN-optimized text recognition with binary embeddings for Arabic expiry date recognition |
title_full | CNN-optimized text recognition with binary embeddings for Arabic expiry date recognition |
title_fullStr | CNN-optimized text recognition with binary embeddings for Arabic expiry date recognition |
title_full_unstemmed | CNN-optimized text recognition with binary embeddings for Arabic expiry date recognition |
title_short | CNN-optimized text recognition with binary embeddings for Arabic expiry date recognition |
title_sort | cnn optimized text recognition with binary embeddings for arabic expiry date recognition |
url | https://doi.org/10.1186/s43067-024-00136-2 |
work_keys_str_mv | AT mohamedlotfy cnnoptimizedtextrecognitionwithbinaryembeddingsforarabicexpirydaterecognition AT ghadasoliman cnnoptimizedtextrecognitionwithbinaryembeddingsforarabicexpirydaterecognition |