Binarization and Segmentation Framework for Sundanese Ancient Documents
Binarization and segmentation process are two first important methods for optical character recognition system. For ancient document image which is written by human, binarization process remains a major challenge.In general, it is occurring because the image quality is badly degraded image and has v...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | Indonesian |
Published: |
Universitas Negeri Yogyakarta
2017-11-01
|
Series: | Jurnal Sains Dasar |
Subjects: | |
Online Access: | https://journal.uny.ac.id/index.php/jsd/article/view/15314 |
_version_ | 1819262507295965184 |
---|---|
author | Erick Paulus Mira Suryani Setiawan Hadi Rahmat Sopian Akik Hidayat |
author_facet | Erick Paulus Mira Suryani Setiawan Hadi Rahmat Sopian Akik Hidayat |
author_sort | Erick Paulus |
collection | DOAJ |
description | Binarization and segmentation process are two first important methods for optical character recognition system. For ancient document image which is written by human, binarization process remains a major challenge.In general, it is occurring because the image quality is badly degraded image and has various different noises in the non-text area.After binarization process, segmentation based on line is conducted in separate text-line from the others. We proposedanovel frameworkof binarization and segmentation process that enhance the performance of Niblackbinarization method and implementthe minimum of energy function to find the path of the separator line between two text-line.For experiments, we use the 22 images that come from the Sundanese ancient documents on Kropak 18 and Kropak22. The evaluation matrix show that our proposed binarization succeeded to improve F-measure 20%for Kropak 22 and 50% for Kropak 18 from original Niblack method.Then, we present the influence of various input images both true color and binary image to text-line segmentation. In line segmentation process, binarized image from our proposed framework can producethe number of line-text as same as the number of target lines. Overall, our proposed framework produce promised results so it can be used as input images for the next OCR process. |
first_indexed | 2024-12-23T19:58:47Z |
format | Article |
id | doaj.art-134c8a97d61d4de391770d97a872e7a4 |
institution | Directory Open Access Journal |
issn | 2085-9872 2443-1273 |
language | Indonesian |
last_indexed | 2024-12-23T19:58:47Z |
publishDate | 2017-11-01 |
publisher | Universitas Negeri Yogyakarta |
record_format | Article |
series | Jurnal Sains Dasar |
spelling | doaj.art-134c8a97d61d4de391770d97a872e7a42022-12-21T17:33:09ZindUniversitas Negeri YogyakartaJurnal Sains Dasar2085-98722443-12732017-11-016213314210.21831/j. saind dasar.v6i2.153149654Binarization and Segmentation Framework for Sundanese Ancient DocumentsErick Paulus0Mira Suryani1Setiawan Hadi2Rahmat Sopian3Akik Hidayat4Department of Computer Science Universitas Padjadjaran, IndonesiaDepartment of Computer Science Universitas Padjadjaran, IndonesiaDepartment of Computer Science Universitas Padjadjaran, IndonesiaSundanese Culture Studie, UniversitasPadjadjaran, IndonesiaDepartment of Computer Science Universitas Padjadjaran, IndonesiaBinarization and segmentation process are two first important methods for optical character recognition system. For ancient document image which is written by human, binarization process remains a major challenge.In general, it is occurring because the image quality is badly degraded image and has various different noises in the non-text area.After binarization process, segmentation based on line is conducted in separate text-line from the others. We proposedanovel frameworkof binarization and segmentation process that enhance the performance of Niblackbinarization method and implementthe minimum of energy function to find the path of the separator line between two text-line.For experiments, we use the 22 images that come from the Sundanese ancient documents on Kropak 18 and Kropak22. The evaluation matrix show that our proposed binarization succeeded to improve F-measure 20%for Kropak 22 and 50% for Kropak 18 from original Niblack method.Then, we present the influence of various input images both true color and binary image to text-line segmentation. In line segmentation process, binarized image from our proposed framework can producethe number of line-text as same as the number of target lines. Overall, our proposed framework produce promised results so it can be used as input images for the next OCR process.https://journal.uny.ac.id/index.php/jsd/article/view/15314binarization, segmentation, ancient document |
spellingShingle | Erick Paulus Mira Suryani Setiawan Hadi Rahmat Sopian Akik Hidayat Binarization and Segmentation Framework for Sundanese Ancient Documents Jurnal Sains Dasar binarization, segmentation, ancient document |
title | Binarization and Segmentation Framework for Sundanese Ancient Documents |
title_full | Binarization and Segmentation Framework for Sundanese Ancient Documents |
title_fullStr | Binarization and Segmentation Framework for Sundanese Ancient Documents |
title_full_unstemmed | Binarization and Segmentation Framework for Sundanese Ancient Documents |
title_short | Binarization and Segmentation Framework for Sundanese Ancient Documents |
title_sort | binarization and segmentation framework for sundanese ancient documents |
topic | binarization, segmentation, ancient document |
url | https://journal.uny.ac.id/index.php/jsd/article/view/15314 |
work_keys_str_mv | AT erickpaulus binarizationandsegmentationframeworkforsundaneseancientdocuments AT mirasuryani binarizationandsegmentationframeworkforsundaneseancientdocuments AT setiawanhadi binarizationandsegmentationframeworkforsundaneseancientdocuments AT rahmatsopian binarizationandsegmentationframeworkforsundaneseancientdocuments AT akikhidayat binarizationandsegmentationframeworkforsundaneseancientdocuments |