A Holistic Technique for an Arabic OCR System

Analytical based approaches in Optical Character Recognition (OCR) systems can endure a significant amount of segmentation errors, especially when dealing with cursive languages such as the Arabic language with frequent overlapping between characters. Holistic based approaches that consider whole wo...

Full description

Bibliographic Details
Main Authors: Farhan M. A. Nashwan, Mohsen A. A. Rashwan, Hassanin M. Al-Barhamtoshy, Sherif M. Abdou, Abdullah M. Moussa
Format: Article
Language:English
Published: MDPI AG 2017-12-01
Series:Journal of Imaging
Subjects:
Online Access:https://www.mdpi.com/2313-433X/4/1/6
_version_ 1828872565726117888
author Farhan M. A. Nashwan
Mohsen A. A. Rashwan
Hassanin M. Al-Barhamtoshy
Sherif M. Abdou
Abdullah M. Moussa
author_facet Farhan M. A. Nashwan
Mohsen A. A. Rashwan
Hassanin M. Al-Barhamtoshy
Sherif M. Abdou
Abdullah M. Moussa
author_sort Farhan M. A. Nashwan
collection DOAJ
description Analytical based approaches in Optical Character Recognition (OCR) systems can endure a significant amount of segmentation errors, especially when dealing with cursive languages such as the Arabic language with frequent overlapping between characters. Holistic based approaches that consider whole words as single units were introduced as an effective approach to avoid such segmentation errors. Still the main challenge for these approaches is their computation complexity, especially when dealing with large vocabulary applications. In this paper, we introduce a computationally efficient, holistic Arabic OCR system. A lexicon reduction approach based on clustering similar shaped words is used to reduce recognition time. Using global word level Discrete Cosine Transform (DCT) based features in combination with local block based features, our proposed approach managed to generalize for new font sizes that were not included in the training data. Evaluation results for the approach using different test sets from modern and historical Arabic books are promising compared with state of art Arabic OCR systems.
first_indexed 2024-12-13T06:56:28Z
format Article
id doaj.art-9bf40d43c3104df7aba9bc2892410662
institution Directory Open Access Journal
issn 2313-433X
language English
last_indexed 2024-12-13T06:56:28Z
publishDate 2017-12-01
publisher MDPI AG
record_format Article
series Journal of Imaging
spelling doaj.art-9bf40d43c3104df7aba9bc28924106622022-12-21T23:56:02ZengMDPI AGJournal of Imaging2313-433X2017-12-0141610.3390/jimaging4010006jimaging4010006A Holistic Technique for an Arabic OCR SystemFarhan M. A. Nashwan0Mohsen A. A. Rashwan1Hassanin M. Al-Barhamtoshy2Sherif M. Abdou3Abdullah M. Moussa4Department of Electronics and Electrical Communications, Cairo University, Giza 12613, EgyptDepartment of Electronics and Electrical Communications, Cairo University, Giza 12613, EgyptFaculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi ArabiaFaculty of Computers & Information, Cairo University, Giza 12613, EgyptDepartment of Electronics and Electrical Communications, Cairo University, Giza 12613, EgyptAnalytical based approaches in Optical Character Recognition (OCR) systems can endure a significant amount of segmentation errors, especially when dealing with cursive languages such as the Arabic language with frequent overlapping between characters. Holistic based approaches that consider whole words as single units were introduced as an effective approach to avoid such segmentation errors. Still the main challenge for these approaches is their computation complexity, especially when dealing with large vocabulary applications. In this paper, we introduce a computationally efficient, holistic Arabic OCR system. A lexicon reduction approach based on clustering similar shaped words is used to reduce recognition time. Using global word level Discrete Cosine Transform (DCT) based features in combination with local block based features, our proposed approach managed to generalize for new font sizes that were not included in the training data. Evaluation results for the approach using different test sets from modern and historical Arabic books are promising compared with state of art Arabic OCR systems.https://www.mdpi.com/2313-433X/4/1/6Arabic OCR systemsholistic OCR approachholistic OCR featureslexicon reduction
spellingShingle Farhan M. A. Nashwan
Mohsen A. A. Rashwan
Hassanin M. Al-Barhamtoshy
Sherif M. Abdou
Abdullah M. Moussa
A Holistic Technique for an Arabic OCR System
Journal of Imaging
Arabic OCR systems
holistic OCR approach
holistic OCR features
lexicon reduction
title A Holistic Technique for an Arabic OCR System
title_full A Holistic Technique for an Arabic OCR System
title_fullStr A Holistic Technique for an Arabic OCR System
title_full_unstemmed A Holistic Technique for an Arabic OCR System
title_short A Holistic Technique for an Arabic OCR System
title_sort holistic technique for an arabic ocr system
topic Arabic OCR systems
holistic OCR approach
holistic OCR features
lexicon reduction
url https://www.mdpi.com/2313-433X/4/1/6
work_keys_str_mv AT farhanmanashwan aholistictechniqueforanarabicocrsystem
AT mohsenaarashwan aholistictechniqueforanarabicocrsystem
AT hassaninmalbarhamtoshy aholistictechniqueforanarabicocrsystem
AT sherifmabdou aholistictechniqueforanarabicocrsystem
AT abdullahmmoussa aholistictechniqueforanarabicocrsystem
AT farhanmanashwan holistictechniqueforanarabicocrsystem
AT mohsenaarashwan holistictechniqueforanarabicocrsystem
AT hassaninmalbarhamtoshy holistictechniqueforanarabicocrsystem
AT sherifmabdou holistictechniqueforanarabicocrsystem
AT abdullahmmoussa holistictechniqueforanarabicocrsystem