A Holistic Technique for an Arabic OCR System
Analytical based approaches in Optical Character Recognition (OCR) systems can endure a significant amount of segmentation errors, especially when dealing with cursive languages such as the Arabic language with frequent overlapping between characters. Holistic based approaches that consider whole wo...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2017-12-01
|
Series: | Journal of Imaging |
Subjects: | |
Online Access: | https://www.mdpi.com/2313-433X/4/1/6 |
_version_ | 1828872565726117888 |
---|---|
author | Farhan M. A. Nashwan Mohsen A. A. Rashwan Hassanin M. Al-Barhamtoshy Sherif M. Abdou Abdullah M. Moussa |
author_facet | Farhan M. A. Nashwan Mohsen A. A. Rashwan Hassanin M. Al-Barhamtoshy Sherif M. Abdou Abdullah M. Moussa |
author_sort | Farhan M. A. Nashwan |
collection | DOAJ |
description | Analytical based approaches in Optical Character Recognition (OCR) systems can endure a significant amount of segmentation errors, especially when dealing with cursive languages such as the Arabic language with frequent overlapping between characters. Holistic based approaches that consider whole words as single units were introduced as an effective approach to avoid such segmentation errors. Still the main challenge for these approaches is their computation complexity, especially when dealing with large vocabulary applications. In this paper, we introduce a computationally efficient, holistic Arabic OCR system. A lexicon reduction approach based on clustering similar shaped words is used to reduce recognition time. Using global word level Discrete Cosine Transform (DCT) based features in combination with local block based features, our proposed approach managed to generalize for new font sizes that were not included in the training data. Evaluation results for the approach using different test sets from modern and historical Arabic books are promising compared with state of art Arabic OCR systems. |
first_indexed | 2024-12-13T06:56:28Z |
format | Article |
id | doaj.art-9bf40d43c3104df7aba9bc2892410662 |
institution | Directory Open Access Journal |
issn | 2313-433X |
language | English |
last_indexed | 2024-12-13T06:56:28Z |
publishDate | 2017-12-01 |
publisher | MDPI AG |
record_format | Article |
series | Journal of Imaging |
spelling | doaj.art-9bf40d43c3104df7aba9bc28924106622022-12-21T23:56:02ZengMDPI AGJournal of Imaging2313-433X2017-12-0141610.3390/jimaging4010006jimaging4010006A Holistic Technique for an Arabic OCR SystemFarhan M. A. Nashwan0Mohsen A. A. Rashwan1Hassanin M. Al-Barhamtoshy2Sherif M. Abdou3Abdullah M. Moussa4Department of Electronics and Electrical Communications, Cairo University, Giza 12613, EgyptDepartment of Electronics and Electrical Communications, Cairo University, Giza 12613, EgyptFaculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi ArabiaFaculty of Computers & Information, Cairo University, Giza 12613, EgyptDepartment of Electronics and Electrical Communications, Cairo University, Giza 12613, EgyptAnalytical based approaches in Optical Character Recognition (OCR) systems can endure a significant amount of segmentation errors, especially when dealing with cursive languages such as the Arabic language with frequent overlapping between characters. Holistic based approaches that consider whole words as single units were introduced as an effective approach to avoid such segmentation errors. Still the main challenge for these approaches is their computation complexity, especially when dealing with large vocabulary applications. In this paper, we introduce a computationally efficient, holistic Arabic OCR system. A lexicon reduction approach based on clustering similar shaped words is used to reduce recognition time. Using global word level Discrete Cosine Transform (DCT) based features in combination with local block based features, our proposed approach managed to generalize for new font sizes that were not included in the training data. Evaluation results for the approach using different test sets from modern and historical Arabic books are promising compared with state of art Arabic OCR systems.https://www.mdpi.com/2313-433X/4/1/6Arabic OCR systemsholistic OCR approachholistic OCR featureslexicon reduction |
spellingShingle | Farhan M. A. Nashwan Mohsen A. A. Rashwan Hassanin M. Al-Barhamtoshy Sherif M. Abdou Abdullah M. Moussa A Holistic Technique for an Arabic OCR System Journal of Imaging Arabic OCR systems holistic OCR approach holistic OCR features lexicon reduction |
title | A Holistic Technique for an Arabic OCR System |
title_full | A Holistic Technique for an Arabic OCR System |
title_fullStr | A Holistic Technique for an Arabic OCR System |
title_full_unstemmed | A Holistic Technique for an Arabic OCR System |
title_short | A Holistic Technique for an Arabic OCR System |
title_sort | holistic technique for an arabic ocr system |
topic | Arabic OCR systems holistic OCR approach holistic OCR features lexicon reduction |
url | https://www.mdpi.com/2313-433X/4/1/6 |
work_keys_str_mv | AT farhanmanashwan aholistictechniqueforanarabicocrsystem AT mohsenaarashwan aholistictechniqueforanarabicocrsystem AT hassaninmalbarhamtoshy aholistictechniqueforanarabicocrsystem AT sherifmabdou aholistictechniqueforanarabicocrsystem AT abdullahmmoussa aholistictechniqueforanarabicocrsystem AT farhanmanashwan holistictechniqueforanarabicocrsystem AT mohsenaarashwan holistictechniqueforanarabicocrsystem AT hassaninmalbarhamtoshy holistictechniqueforanarabicocrsystem AT sherifmabdou holistictechniqueforanarabicocrsystem AT abdullahmmoussa holistictechniqueforanarabicocrsystem |