Experimental evaluation of Arabic OCR systems

Purpose – The aim of this paper is to experimentally evaluate the effectiveness of the state-of-the-art printed Arabic text recognition systems to determine open areas for future improvements. In addition, this paper proposes a standard protocol with a set of metrics for measuring the effectiveness...

Full description

Bibliographic Details
Main Authors: Mansoor Alghamdi, William Teahan
Format: Article
Language:English
Published: Tsinghua University Press 2017-11-01
Series:International Journal of Crowd Science
Subjects:
Online Access:https://www.emeraldinsight.com/doi/pdfplus/10.1108/PRR-05-2017-0026
Description
Summary:Purpose – The aim of this paper is to experimentally evaluate the effectiveness of the state-of-the-art printed Arabic text recognition systems to determine open areas for future improvements. In addition, this paper proposes a standard protocol with a set of metrics for measuring the effectiveness of Arabic optical character recognition (OCR) systems to assist researchers in comparing different Arabic OCR approaches. Design/methodology/approach – This paper describes an experiment to automatically evaluate four well-known Arabic OCR systems using a set of performance metrics. The evaluation experiment is conducted on a publicly available printed Arabic dataset comprising 240 text images with a variety of resolution levels, font types, font styles and font sizes. Findings – The experimental results show that the field of character recognition for printed Arabic still requires further research to reach an efficient text recognition method for Arabic script. Originality/value – To the best of the authors’ knowledge, this is the first work that provides a comprehensive automated evaluation of Arabic OCR systems with respect to the characteristics of Arabic script and, in addition, proposes an evaluation methodology that can be used as a benchmark by researchers and therefore will contribute significantly to the enhancement of the field of Arabic script recognition.
ISSN:2398-7294