Lexicon Reduction for Urdu/Arabic Script Based Character Recognition: A Multilingual OCR

Arabic script character recognition is challenging task due to complexity of the script and huge number of ligatures. We present a method for the development of multilingual Arabic script OCR (Optical Character Recognition) and lexicon reduction for Arabic Script and its derivative languages. The ob...

Full description

Bibliographic Details
Main Authors: Saeeda Naz, Arif Iqbal Umar, Muhammad Imran Razzak
Format: Article
Language:English
Published: Mehran University of Engineering and Technology 2016-04-01
Series:Mehran University Research Journal of Engineering and Technology
Subjects:
Online Access:http://publications.muet.edu.pk/research_papers/pdf/pdf1277.pdf
_version_ 1811313538131558400
author Saeeda Naz
Arif Iqbal Umar
Muhammad Imran Razzak
author_facet Saeeda Naz
Arif Iqbal Umar
Muhammad Imran Razzak
author_sort Saeeda Naz
collection DOAJ
description Arabic script character recognition is challenging task due to complexity of the script and huge number of ligatures. We present a method for the development of multilingual Arabic script OCR (Optical Character Recognition) and lexicon reduction for Arabic Script and its derivative languages. The objective of the proposed method is to overcome the large dataset Urdu and similar scripts by using GCT (Ghost Character Theory) concept. Arabic and its sibling script languages share the similar character dataset i.e. the character set are difference in diacritic and writing styles like Naskh or Nasta?liq. Based on the proposed method, the lexicon for Arabic and Arabic script based languages can be minimized approximately up to 20 times. The proposed multilingual Arabic script OCR approach have been evaluated for online Arabic and its derivative language like Urdu using BPNN. The result showed that proposed method helps to not only the reduction of lexicon but also helps to develop the Multilanguage character recognition system for Arabic Script.
first_indexed 2024-04-13T10:55:32Z
format Article
id doaj.art-4fb9088eddea441586f2dadb83d728d0
institution Directory Open Access Journal
issn 0254-7821
2413-7219
language English
last_indexed 2024-04-13T10:55:32Z
publishDate 2016-04-01
publisher Mehran University of Engineering and Technology
record_format Article
series Mehran University Research Journal of Engineering and Technology
spelling doaj.art-4fb9088eddea441586f2dadb83d728d02022-12-22T02:49:32ZengMehran University of Engineering and TechnologyMehran University Research Journal of Engineering and Technology0254-78212413-72192016-04-01352209216Lexicon Reduction for Urdu/Arabic Script Based Character Recognition: A Multilingual OCRSaeeda Naz0Arif Iqbal Umar1Muhammad Imran Razzak2Department of Information Technology, Hazara University, Mansehra, KPK, and Government Post-Graduae Girls College No. 1, Higher Education Department, Abbottabad, KPK, PakistanDepartment of Information Technology, Hazara University, Mansehra, KPK, PakistanKing Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi ArabiaArabic script character recognition is challenging task due to complexity of the script and huge number of ligatures. We present a method for the development of multilingual Arabic script OCR (Optical Character Recognition) and lexicon reduction for Arabic Script and its derivative languages. The objective of the proposed method is to overcome the large dataset Urdu and similar scripts by using GCT (Ghost Character Theory) concept. Arabic and its sibling script languages share the similar character dataset i.e. the character set are difference in diacritic and writing styles like Naskh or Nasta?liq. Based on the proposed method, the lexicon for Arabic and Arabic script based languages can be minimized approximately up to 20 times. The proposed multilingual Arabic script OCR approach have been evaluated for online Arabic and its derivative language like Urdu using BPNN. The result showed that proposed method helps to not only the reduction of lexicon but also helps to develop the Multilanguage character recognition system for Arabic Script.http://publications.muet.edu.pk/research_papers/pdf/pdf1277.pdfUrdu Optical Character RecognitionMultilingual Optical Character RecognitionNaskhNasta’liqNasta’liq
spellingShingle Saeeda Naz
Arif Iqbal Umar
Muhammad Imran Razzak
Lexicon Reduction for Urdu/Arabic Script Based Character Recognition: A Multilingual OCR
Mehran University Research Journal of Engineering and Technology
Urdu Optical Character Recognition
Multilingual Optical Character Recognition
Naskh
Nasta’liq
Nasta’liq
title Lexicon Reduction for Urdu/Arabic Script Based Character Recognition: A Multilingual OCR
title_full Lexicon Reduction for Urdu/Arabic Script Based Character Recognition: A Multilingual OCR
title_fullStr Lexicon Reduction for Urdu/Arabic Script Based Character Recognition: A Multilingual OCR
title_full_unstemmed Lexicon Reduction for Urdu/Arabic Script Based Character Recognition: A Multilingual OCR
title_short Lexicon Reduction for Urdu/Arabic Script Based Character Recognition: A Multilingual OCR
title_sort lexicon reduction for urdu arabic script based character recognition a multilingual ocr
topic Urdu Optical Character Recognition
Multilingual Optical Character Recognition
Naskh
Nasta’liq
Nasta’liq
url http://publications.muet.edu.pk/research_papers/pdf/pdf1277.pdf
work_keys_str_mv AT saeedanaz lexiconreductionforurduarabicscriptbasedcharacterrecognitionamultilingualocr
AT arifiqbalumar lexiconreductionforurduarabicscriptbasedcharacterrecognitionamultilingualocr
AT muhammadimranrazzak lexiconreductionforurduarabicscriptbasedcharacterrecognitionamultilingualocr