Two bigrams based language model for auto correction of Arabic OCR errors

In Optical character recognition (OCR), the characteristics of Arabic text cause more errors than in English text.In this paper, a two bi-grams based language model that uses Wikipedia's database is presented.The method can perform auto detection and correction of non-word errors in Arabic OCR...

Full description

Bibliographic Details
Main Authors:	Habeeb, Imad Q., Mohd Yusof, Shahrul Azmi, Ahmad, Faudziah
Format:	Article
Language:	English
Published:	AICIT, Korea 2014
Subjects:	QA76 Computer software
Online Access:	https://repo.uum.edu.my/id/eprint/12602/1/JDCTA3630PPL.pdf

_version_	1825803035621720064
author	Habeeb, Imad Q. Mohd Yusof, Shahrul Azmi Ahmad, Faudziah
author_facet	Habeeb, Imad Q. Mohd Yusof, Shahrul Azmi Ahmad, Faudziah
author_sort	Habeeb, Imad Q.
collection	UUM
description	In Optical character recognition (OCR), the characteristics of Arabic text cause more errors than in English text.In this paper, a two bi-grams based language model that uses Wikipedia's database is presented.The method can perform auto detection and correction of non-word errors in Arabic OCR text, and auto detection of real word errors. The method consists of two parts: extracting the context information from Wikipedia's database, and implement the auto detection and correction of incorrect words.This method can be applied to any language with little modifications.The experimental results show successful extraction of context information from Wikipedia's articles. Furthermore, it also shows that using this method can reduce the error rate of Arabic OCR text.
first_indexed	2024-07-04T05:50:20Z
format	Article
id	uum-12602
institution	Universiti Utara Malaysia
language	English
last_indexed	2024-07-04T05:50:20Z
publishDate	2014
publisher	AICIT, Korea
record_format	eprints
spelling	uum-126022016-05-15T01:07:50Z https://repo.uum.edu.my/id/eprint/12602/ Two bigrams based language model for auto correction of Arabic OCR errors Habeeb, Imad Q. Mohd Yusof, Shahrul Azmi Ahmad, Faudziah QA76 Computer software In Optical character recognition (OCR), the characteristics of Arabic text cause more errors than in English text.In this paper, a two bi-grams based language model that uses Wikipedia's database is presented.The method can perform auto detection and correction of non-word errors in Arabic OCR text, and auto detection of real word errors. The method consists of two parts: extracting the context information from Wikipedia's database, and implement the auto detection and correction of incorrect words.This method can be applied to any language with little modifications.The experimental results show successful extraction of context information from Wikipedia's articles. Furthermore, it also shows that using this method can reduce the error rate of Arabic OCR text. AICIT, Korea 2014-02 Article PeerReviewed application/pdf en https://repo.uum.edu.my/id/eprint/12602/1/JDCTA3630PPL.pdf Habeeb, Imad Q. and Mohd Yusof, Shahrul Azmi and Ahmad, Faudziah (2014) Two bigrams based language model for auto correction of Arabic OCR errors. International Journal of Digital Content Technology and its Applications (JDCTA), 8 (1). pp. 72-80. ISSN 2233-9310 http://www.aicit.org/jdcta/global/paper_detail.html?jname=JDCTA&q=3630
spellingShingle	QA76 Computer software Habeeb, Imad Q. Mohd Yusof, Shahrul Azmi Ahmad, Faudziah Two bigrams based language model for auto correction of Arabic OCR errors
title	Two bigrams based language model for auto correction of Arabic OCR errors
title_full	Two bigrams based language model for auto correction of Arabic OCR errors
title_fullStr	Two bigrams based language model for auto correction of Arabic OCR errors
title_full_unstemmed	Two bigrams based language model for auto correction of Arabic OCR errors
title_short	Two bigrams based language model for auto correction of Arabic OCR errors
title_sort	two bigrams based language model for auto correction of arabic ocr errors
topic	QA76 Computer software
url	https://repo.uum.edu.my/id/eprint/12602/1/JDCTA3630PPL.pdf
work_keys_str_mv	AT habeebimadq twobigramsbasedlanguagemodelforautocorrectionofarabicocrerrors AT mohdyusofshahrulazmi twobigramsbasedlanguagemodelforautocorrectionofarabicocrerrors AT ahmadfaudziah twobigramsbasedlanguagemodelforautocorrectionofarabicocrerrors

Two bigrams based language model for auto correction of Arabic OCR errors

Similar Items