Optical character recognition system for Baybayin scripts using support vector machine

In 2018, the Philippine Congress signed House Bill 1022 declaring the Baybayin script as the Philippines’ national writing system. In this regard, it is highly probable that the Baybayin and Latin scripts would appear in a single document. In this work, we propose a system that discriminates the cha...

Full description

Bibliographic Details
Main Authors: Rodney Pino, Renier Mendoza, Rachelle Sambayan
Format: Article
Language:English
Published: PeerJ Inc. 2021-02-01
Series:PeerJ Computer Science
Subjects:
Online Access:https://peerj.com/articles/cs-360.pdf
_version_ 1818955135013879808
author Rodney Pino
Renier Mendoza
Rachelle Sambayan
author_facet Rodney Pino
Renier Mendoza
Rachelle Sambayan
author_sort Rodney Pino
collection DOAJ
description In 2018, the Philippine Congress signed House Bill 1022 declaring the Baybayin script as the Philippines’ national writing system. In this regard, it is highly probable that the Baybayin and Latin scripts would appear in a single document. In this work, we propose a system that discriminates the characters of both scripts. The proposed system considers the normalization of an individual character to identify if it belongs to Baybayin or Latin script and further classify them as to what unit they represent. This gives us four classification problems, namely: (1) Baybayin and Latin script recognition, (2) Baybayin character classification, (3) Latin character classification, and (4) Baybayin diacritical marks classification. To the best of our knowledge, this is the first study that makes use of Support Vector Machine (SVM) for Baybayin script recognition. This work also provides a new dataset for Baybayin, its diacritics, and Latin characters. Classification problems (1) and (4) use binary SVM while (2) and (3) apply the multiclass SVM classification. On average, our numerical experiments yield satisfactory results: (1) has 98.5% accuracy, 98.5% precision, 98.49% recall, and 98.5% F1 Score; (2) has 96.51% accuracy, 95.62% precision, 95.61% recall, and 95.62% F1 Score; (3) has 95.8% accuracy, 95.85% precision, 95.8% recall, and 95.83% F1 Score; and (4) has 100% accuracy, 100% precision, 100% recall, and 100% F1 Score.
first_indexed 2024-12-20T10:33:14Z
format Article
id doaj.art-9598dac29dc544e7b584499934b0b8e0
institution Directory Open Access Journal
issn 2376-5992
language English
last_indexed 2024-12-20T10:33:14Z
publishDate 2021-02-01
publisher PeerJ Inc.
record_format Article
series PeerJ Computer Science
spelling doaj.art-9598dac29dc544e7b584499934b0b8e02022-12-21T19:43:41ZengPeerJ Inc.PeerJ Computer Science2376-59922021-02-017e36010.7717/peerj-cs.360Optical character recognition system for Baybayin scripts using support vector machineRodney Pino0Renier Mendoza1Rachelle Sambayan2Institute of Mathematics, University of the Philippines Diliman, Quezon City, Metro Manila, PhilippinesInstitute of Mathematics, University of the Philippines Diliman, Quezon City, Metro Manila, PhilippinesInstitute of Mathematics, University of the Philippines Diliman, Quezon City, Metro Manila, PhilippinesIn 2018, the Philippine Congress signed House Bill 1022 declaring the Baybayin script as the Philippines’ national writing system. In this regard, it is highly probable that the Baybayin and Latin scripts would appear in a single document. In this work, we propose a system that discriminates the characters of both scripts. The proposed system considers the normalization of an individual character to identify if it belongs to Baybayin or Latin script and further classify them as to what unit they represent. This gives us four classification problems, namely: (1) Baybayin and Latin script recognition, (2) Baybayin character classification, (3) Latin character classification, and (4) Baybayin diacritical marks classification. To the best of our knowledge, this is the first study that makes use of Support Vector Machine (SVM) for Baybayin script recognition. This work also provides a new dataset for Baybayin, its diacritics, and Latin characters. Classification problems (1) and (4) use binary SVM while (2) and (3) apply the multiclass SVM classification. On average, our numerical experiments yield satisfactory results: (1) has 98.5% accuracy, 98.5% precision, 98.49% recall, and 98.5% F1 Score; (2) has 96.51% accuracy, 95.62% precision, 95.61% recall, and 95.62% F1 Score; (3) has 95.8% accuracy, 95.85% precision, 95.8% recall, and 95.83% F1 Score; and (4) has 100% accuracy, 100% precision, 100% recall, and 100% F1 Score.https://peerj.com/articles/cs-360.pdfBaybayinLatin script identificationBaybayin script identificationSupport vector machineOptical character recognition
spellingShingle Rodney Pino
Renier Mendoza
Rachelle Sambayan
Optical character recognition system for Baybayin scripts using support vector machine
PeerJ Computer Science
Baybayin
Latin script identification
Baybayin script identification
Support vector machine
Optical character recognition
title Optical character recognition system for Baybayin scripts using support vector machine
title_full Optical character recognition system for Baybayin scripts using support vector machine
title_fullStr Optical character recognition system for Baybayin scripts using support vector machine
title_full_unstemmed Optical character recognition system for Baybayin scripts using support vector machine
title_short Optical character recognition system for Baybayin scripts using support vector machine
title_sort optical character recognition system for baybayin scripts using support vector machine
topic Baybayin
Latin script identification
Baybayin script identification
Support vector machine
Optical character recognition
url https://peerj.com/articles/cs-360.pdf
work_keys_str_mv AT rodneypino opticalcharacterrecognitionsystemforbaybayinscriptsusingsupportvectormachine
AT reniermendoza opticalcharacterrecognitionsystemforbaybayinscriptsusingsupportvectormachine
AT rachellesambayan opticalcharacterrecognitionsystemforbaybayinscriptsusingsupportvectormachine