Balinese Script Recognition Using Tesseract Mobile Framework

One of the main factors causing the decline in the use of Balinese Script is that Balinese people are less interested in reading Balinese Script because of their reluctance to learn Balinese Script, which is relatively complicated in the recognition process. The development of computer technology ha...

Full description

Bibliographic Details
Main Authors: Gede Indrawan, Ahmad Asroni, Luh Joni Erawati Dewi, I Gede Aris Gunadi, I Ketut Paramarta
Format: Article
Language:English
Published: Udayana University, Institute for Research and Community Services 2022-11-01
Series:Lontar Komputer
Online Access:https://ojs.unud.ac.id/index.php/lontar/article/view/92159
_version_ 1828090103154606080
author Gede Indrawan
Ahmad Asroni
Luh Joni Erawati Dewi
I Gede Aris Gunadi
I Ketut Paramarta
author_facet Gede Indrawan
Ahmad Asroni
Luh Joni Erawati Dewi
I Gede Aris Gunadi
I Ketut Paramarta
author_sort Gede Indrawan
collection DOAJ
description One of the main factors causing the decline in the use of Balinese Script is that Balinese people are less interested in reading Balinese Script because of their reluctance to learn Balinese Script, which is relatively complicated in the recognition process. The development of computer technology has now been used to help by performing character recognition or known as Optical Character Recognition (OCR). Developing the OCR application for Balinese Script is an effort to help preserve, from the technology side, as a means of education related to Balinese Script. In this study, that development was conducted by using a Tesseract OCR engine that consists of several stages, i.e., the first one is to prepare the dataset, the second one is to generate the dataset using the Web Scraping method, the third one is to train the OCR engine using the generated dataset, and finally, the fourth one is to implement the generated language model into a mobile-based application. The study results prove that the dataset generation process using the Web Scraping method can be a better choice when faced with a training dataset that requires a large dataset compared to several previous studies of non-Latin character recognition. In those studies, the jTessBox tools were used, which took time because they had to select per character for a dataset. The best result of the language model is a combination of character, word, sentence, and paragraph datasets (hierarchical combination of character, word, sentence, and paragraph datasets) with a coincidence rate of 66.67%. The more diverse and structured hierarchical datasets used, the higher the coincidence rate.
first_indexed 2024-04-11T05:51:49Z
format Article
id doaj.art-1b9ec4938e854f10a7893e977dbdc939
institution Directory Open Access Journal
issn 2088-1541
2541-5832
language English
last_indexed 2024-04-11T05:51:49Z
publishDate 2022-11-01
publisher Udayana University, Institute for Research and Community Services
record_format Article
series Lontar Komputer
spelling doaj.art-1b9ec4938e854f10a7893e977dbdc9392022-12-22T04:42:03ZengUdayana University, Institute for Research and Community ServicesLontar Komputer2088-15412541-58322022-11-0113316017110.24843/LKJITI.2022.v13.i03.p0392159Balinese Script Recognition Using Tesseract Mobile FrameworkGede Indrawan0Ahmad Asroni1Luh Joni Erawati Dewi2I Gede Aris Gunadi3I Ketut Paramarta4Universitas Pendidikan GaneshaDepartment of Electrical Engineering and Computer Science, Universitas Pendidikan GaneshaDepartment of Electrical Engineering and Computer Science, Universitas Pendidikan GaneshaDepartment of Electrical Engineering and Computer Science, Universitas Pendidikan GaneshaDepartment of Balinese Language Education, Universitas Pendidikan GaneshaOne of the main factors causing the decline in the use of Balinese Script is that Balinese people are less interested in reading Balinese Script because of their reluctance to learn Balinese Script, which is relatively complicated in the recognition process. The development of computer technology has now been used to help by performing character recognition or known as Optical Character Recognition (OCR). Developing the OCR application for Balinese Script is an effort to help preserve, from the technology side, as a means of education related to Balinese Script. In this study, that development was conducted by using a Tesseract OCR engine that consists of several stages, i.e., the first one is to prepare the dataset, the second one is to generate the dataset using the Web Scraping method, the third one is to train the OCR engine using the generated dataset, and finally, the fourth one is to implement the generated language model into a mobile-based application. The study results prove that the dataset generation process using the Web Scraping method can be a better choice when faced with a training dataset that requires a large dataset compared to several previous studies of non-Latin character recognition. In those studies, the jTessBox tools were used, which took time because they had to select per character for a dataset. The best result of the language model is a combination of character, word, sentence, and paragraph datasets (hierarchical combination of character, word, sentence, and paragraph datasets) with a coincidence rate of 66.67%. The more diverse and structured hierarchical datasets used, the higher the coincidence rate.https://ojs.unud.ac.id/index.php/lontar/article/view/92159
spellingShingle Gede Indrawan
Ahmad Asroni
Luh Joni Erawati Dewi
I Gede Aris Gunadi
I Ketut Paramarta
Balinese Script Recognition Using Tesseract Mobile Framework
Lontar Komputer
title Balinese Script Recognition Using Tesseract Mobile Framework
title_full Balinese Script Recognition Using Tesseract Mobile Framework
title_fullStr Balinese Script Recognition Using Tesseract Mobile Framework
title_full_unstemmed Balinese Script Recognition Using Tesseract Mobile Framework
title_short Balinese Script Recognition Using Tesseract Mobile Framework
title_sort balinese script recognition using tesseract mobile framework
url https://ojs.unud.ac.id/index.php/lontar/article/view/92159
work_keys_str_mv AT gedeindrawan balinesescriptrecognitionusingtesseractmobileframework
AT ahmadasroni balinesescriptrecognitionusingtesseractmobileframework
AT luhjonierawatidewi balinesescriptrecognitionusingtesseractmobileframework
AT igedearisgunadi balinesescriptrecognitionusingtesseractmobileframework
AT iketutparamarta balinesescriptrecognitionusingtesseractmobileframework