Persian Optical Character Recognition Using Deep Bidirectional Long Short-Term Memory
Optical Character Recognition (OCR) is a system of converting images, including text,into editable text and is applied to various languages such as English, Arabic, and Persian. While these languages have similarities, their fundamental differences can create unique challenges. In Persian, continuit...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-11-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/12/22/11760 |
_version_ | 1797465904094642176 |
---|---|
author | Zohreh Khosrobeigi Hadi Veisi Ehsan Hoseinzade Hanieh Shabanian |
author_facet | Zohreh Khosrobeigi Hadi Veisi Ehsan Hoseinzade Hanieh Shabanian |
author_sort | Zohreh Khosrobeigi |
collection | DOAJ |
description | Optical Character Recognition (OCR) is a system of converting images, including text,into editable text and is applied to various languages such as English, Arabic, and Persian. While these languages have similarities, their fundamental differences can create unique challenges. In Persian, continuity between Characters, the existence of semicircles, dots, oblique, and left-to-right characters such as English words in the context are some of the most important challenges in designing Persian OCR systems. Our proposed framework, Bina, is designed in a special way to address the issue of continuity by utilizing Convolution Neural Network (CNN) and deep bidirectional Long-Short Term Memory (BLSTM), a type of LSTM networks that has access to both past and future context. A huge and diverse dataset, including about 2M samples of both Persian and English contexts,consisting of various fonts and sizes, is also generated to train and test the performance of the proposed model. Various configurations are tested to find the optimal structure of CNN and BLSTM. The results show that Bina successfully outperformed state of the art baseline algorithm by achieving about 96% accuracy in the Persian and 88% accuracy in the Persian and English contexts. |
first_indexed | 2024-03-09T18:29:06Z |
format | Article |
id | doaj.art-30b5f8998b6e41b3857760fd50e76722 |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-09T18:29:06Z |
publishDate | 2022-11-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-30b5f8998b6e41b3857760fd50e767222023-11-24T07:40:30ZengMDPI AGApplied Sciences2076-34172022-11-0112221176010.3390/app122211760Persian Optical Character Recognition Using Deep Bidirectional Long Short-Term MemoryZohreh Khosrobeigi0Hadi Veisi1Ehsan Hoseinzade2Hanieh Shabanian3School of Computer Science and Statistics, Trinity College Dublin, D02 YY50 Dublin, IrelandFaculty of New Sciences and Technologies, University of Tehran, Tehran P.O. Box 14399-56191, IranSchool of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, CanadaComputer Science Department, School of Computing and Analytics, Northern Kentucky University, Highland Heights, KY 41076, USAOptical Character Recognition (OCR) is a system of converting images, including text,into editable text and is applied to various languages such as English, Arabic, and Persian. While these languages have similarities, their fundamental differences can create unique challenges. In Persian, continuity between Characters, the existence of semicircles, dots, oblique, and left-to-right characters such as English words in the context are some of the most important challenges in designing Persian OCR systems. Our proposed framework, Bina, is designed in a special way to address the issue of continuity by utilizing Convolution Neural Network (CNN) and deep bidirectional Long-Short Term Memory (BLSTM), a type of LSTM networks that has access to both past and future context. A huge and diverse dataset, including about 2M samples of both Persian and English contexts,consisting of various fonts and sizes, is also generated to train and test the performance of the proposed model. Various configurations are tested to find the optimal structure of CNN and BLSTM. The results show that Bina successfully outperformed state of the art baseline algorithm by achieving about 96% accuracy in the Persian and 88% accuracy in the Persian and English contexts.https://www.mdpi.com/2076-3417/12/22/11760Optical Character Recognition (OCR)Long Short-Term Memory (LSTM)Bidirectional LSTM (BLSTM)Convolution Neural Network (CNN)Persian language |
spellingShingle | Zohreh Khosrobeigi Hadi Veisi Ehsan Hoseinzade Hanieh Shabanian Persian Optical Character Recognition Using Deep Bidirectional Long Short-Term Memory Applied Sciences Optical Character Recognition (OCR) Long Short-Term Memory (LSTM) Bidirectional LSTM (BLSTM) Convolution Neural Network (CNN) Persian language |
title | Persian Optical Character Recognition Using Deep Bidirectional Long Short-Term Memory |
title_full | Persian Optical Character Recognition Using Deep Bidirectional Long Short-Term Memory |
title_fullStr | Persian Optical Character Recognition Using Deep Bidirectional Long Short-Term Memory |
title_full_unstemmed | Persian Optical Character Recognition Using Deep Bidirectional Long Short-Term Memory |
title_short | Persian Optical Character Recognition Using Deep Bidirectional Long Short-Term Memory |
title_sort | persian optical character recognition using deep bidirectional long short term memory |
topic | Optical Character Recognition (OCR) Long Short-Term Memory (LSTM) Bidirectional LSTM (BLSTM) Convolution Neural Network (CNN) Persian language |
url | https://www.mdpi.com/2076-3417/12/22/11760 |
work_keys_str_mv | AT zohrehkhosrobeigi persianopticalcharacterrecognitionusingdeepbidirectionallongshorttermmemory AT hadiveisi persianopticalcharacterrecognitionusingdeepbidirectionallongshorttermmemory AT ehsanhoseinzade persianopticalcharacterrecognitionusingdeepbidirectionallongshorttermmemory AT haniehshabanian persianopticalcharacterrecognitionusingdeepbidirectionallongshorttermmemory |