Binarization and Segmentation Framework for Sundanese Ancient Documents

Binarization and segmentation process are two first important methods for optical character recognition system. For ancient document image which is written by human, binarization process remains a major challenge.In general, it is occurring because the image quality is badly degraded image and has v...

Full description

Bibliographic Details
Main Authors: Erick Paulus, Mira Suryani, Setiawan Hadi, Rahmat Sopian, Akik Hidayat
Format: Article
Language:Indonesian
Published: Universitas Negeri Yogyakarta 2017-11-01
Series:Jurnal Sains Dasar
Subjects:
Online Access:https://journal.uny.ac.id/index.php/jsd/article/view/15314
_version_ 1819262507295965184
author Erick Paulus
Mira Suryani
Setiawan Hadi
Rahmat Sopian
Akik Hidayat
author_facet Erick Paulus
Mira Suryani
Setiawan Hadi
Rahmat Sopian
Akik Hidayat
author_sort Erick Paulus
collection DOAJ
description Binarization and segmentation process are two first important methods for optical character recognition system. For ancient document image which is written by human, binarization process remains a major challenge.In general, it is occurring because the image quality is badly degraded image and has various different noises in the non-text area.After binarization process, segmentation based on line is conducted in separate text-line from the others. We proposedanovel frameworkof binarization and segmentation process that enhance the performance of Niblackbinarization method and implementthe minimum of energy function to find the path of the separator line between two text-line.For experiments, we use the 22 images that come from the Sundanese ancient documents on Kropak 18 and Kropak22. The evaluation matrix show that our proposed binarization succeeded to improve F-measure 20%for Kropak 22 and 50% for Kropak 18 from original Niblack method.Then, we present the influence of various input images both true color and binary image to text-line segmentation. In line segmentation process, binarized image from our proposed framework can producethe number of line-text as same as the number of target lines. Overall, our proposed framework produce promised results so it can be used as input images for the next OCR process.
first_indexed 2024-12-23T19:58:47Z
format Article
id doaj.art-134c8a97d61d4de391770d97a872e7a4
institution Directory Open Access Journal
issn 2085-9872
2443-1273
language Indonesian
last_indexed 2024-12-23T19:58:47Z
publishDate 2017-11-01
publisher Universitas Negeri Yogyakarta
record_format Article
series Jurnal Sains Dasar
spelling doaj.art-134c8a97d61d4de391770d97a872e7a42022-12-21T17:33:09ZindUniversitas Negeri YogyakartaJurnal Sains Dasar2085-98722443-12732017-11-016213314210.21831/j. saind dasar.v6i2.153149654Binarization and Segmentation Framework for Sundanese Ancient DocumentsErick Paulus0Mira Suryani1Setiawan Hadi2Rahmat Sopian3Akik Hidayat4Department of Computer Science Universitas Padjadjaran, IndonesiaDepartment of Computer Science Universitas Padjadjaran, IndonesiaDepartment of Computer Science Universitas Padjadjaran, IndonesiaSundanese Culture Studie, UniversitasPadjadjaran, IndonesiaDepartment of Computer Science Universitas Padjadjaran, IndonesiaBinarization and segmentation process are two first important methods for optical character recognition system. For ancient document image which is written by human, binarization process remains a major challenge.In general, it is occurring because the image quality is badly degraded image and has various different noises in the non-text area.After binarization process, segmentation based on line is conducted in separate text-line from the others. We proposedanovel frameworkof binarization and segmentation process that enhance the performance of Niblackbinarization method and implementthe minimum of energy function to find the path of the separator line between two text-line.For experiments, we use the 22 images that come from the Sundanese ancient documents on Kropak 18 and Kropak22. The evaluation matrix show that our proposed binarization succeeded to improve F-measure 20%for Kropak 22 and 50% for Kropak 18 from original Niblack method.Then, we present the influence of various input images both true color and binary image to text-line segmentation. In line segmentation process, binarized image from our proposed framework can producethe number of line-text as same as the number of target lines. Overall, our proposed framework produce promised results so it can be used as input images for the next OCR process.https://journal.uny.ac.id/index.php/jsd/article/view/15314binarization, segmentation, ancient document
spellingShingle Erick Paulus
Mira Suryani
Setiawan Hadi
Rahmat Sopian
Akik Hidayat
Binarization and Segmentation Framework for Sundanese Ancient Documents
Jurnal Sains Dasar
binarization, segmentation, ancient document
title Binarization and Segmentation Framework for Sundanese Ancient Documents
title_full Binarization and Segmentation Framework for Sundanese Ancient Documents
title_fullStr Binarization and Segmentation Framework for Sundanese Ancient Documents
title_full_unstemmed Binarization and Segmentation Framework for Sundanese Ancient Documents
title_short Binarization and Segmentation Framework for Sundanese Ancient Documents
title_sort binarization and segmentation framework for sundanese ancient documents
topic binarization, segmentation, ancient document
url https://journal.uny.ac.id/index.php/jsd/article/view/15314
work_keys_str_mv AT erickpaulus binarizationandsegmentationframeworkforsundaneseancientdocuments
AT mirasuryani binarizationandsegmentationframeworkforsundaneseancientdocuments
AT setiawanhadi binarizationandsegmentationframeworkforsundaneseancientdocuments
AT rahmatsopian binarizationandsegmentationframeworkforsundaneseancientdocuments
AT akikhidayat binarizationandsegmentationframeworkforsundaneseancientdocuments