Gradual OCR: An Effective OCR Approach Based on Gradual Detection of Texts
In this paper, we present a novel approach to optical character recognition that incorporates various supplementary techniques, including the gradual detection of texts and gradual filtering of inaccurately recognized texts. To minimize false negatives, we attempt to detect all text by incrementally...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-11-01
|
Series: | Mathematics |
Subjects: | |
Online Access: | https://www.mdpi.com/2227-7390/11/22/4585 |
_version_ | 1797458497532592128 |
---|---|
author | Youngki Park Youhyun Shin |
author_facet | Youngki Park Youhyun Shin |
author_sort | Youngki Park |
collection | DOAJ |
description | In this paper, we present a novel approach to optical character recognition that incorporates various supplementary techniques, including the gradual detection of texts and gradual filtering of inaccurately recognized texts. To minimize false negatives, we attempt to detect all text by incrementally lowering the relevant thresholds. To mitigate false positives, we implement a novel filtering method that dynamically adjusts based on the confidence levels of recognized texts and their corresponding detection thresholds. Additionally, we use straightforward yet effective strategies to enhance the optical character recognition accuracy and speed, such as upscaling, link refinement, perspective transformation, the merging of cropped images, and simple autoregression. Given our focus on Korean chart data, we compile a mix of real-world and artificial Korean chart datasets for experimentation. Our experimental results show that our approach outperforms Tesseract by approximately 7 to 15 times and EasyOCR by 3 to 5 times in accuracy, as measured using a Jaccard similarity-based error rate on our datasets. |
first_indexed | 2024-03-09T16:37:57Z |
format | Article |
id | doaj.art-30ce04eb7f9e438ba1e018e12a5eb7d6 |
institution | Directory Open Access Journal |
issn | 2227-7390 |
language | English |
last_indexed | 2024-03-09T16:37:57Z |
publishDate | 2023-11-01 |
publisher | MDPI AG |
record_format | Article |
series | Mathematics |
spelling | doaj.art-30ce04eb7f9e438ba1e018e12a5eb7d62023-11-24T14:54:05ZengMDPI AGMathematics2227-73902023-11-011122458510.3390/math11224585Gradual OCR: An Effective OCR Approach Based on Gradual Detection of TextsYoungki Park0Youhyun Shin1Department of Computer Education, Chuncheon National University of Education, Chuncheon 24328, Republic of KoreaDepartment of Computer Science and Engineering, Incheon National University, Incheon 22012, Republic of KoreaIn this paper, we present a novel approach to optical character recognition that incorporates various supplementary techniques, including the gradual detection of texts and gradual filtering of inaccurately recognized texts. To minimize false negatives, we attempt to detect all text by incrementally lowering the relevant thresholds. To mitigate false positives, we implement a novel filtering method that dynamically adjusts based on the confidence levels of recognized texts and their corresponding detection thresholds. Additionally, we use straightforward yet effective strategies to enhance the optical character recognition accuracy and speed, such as upscaling, link refinement, perspective transformation, the merging of cropped images, and simple autoregression. Given our focus on Korean chart data, we compile a mix of real-world and artificial Korean chart datasets for experimentation. Our experimental results show that our approach outperforms Tesseract by approximately 7 to 15 times and EasyOCR by 3 to 5 times in accuracy, as measured using a Jaccard similarity-based error rate on our datasets.https://www.mdpi.com/2227-7390/11/22/4585optical character recognitiongradual OCRgradual text detectiongradual low-quality filtering |
spellingShingle | Youngki Park Youhyun Shin Gradual OCR: An Effective OCR Approach Based on Gradual Detection of Texts Mathematics optical character recognition gradual OCR gradual text detection gradual low-quality filtering |
title | Gradual OCR: An Effective OCR Approach Based on Gradual Detection of Texts |
title_full | Gradual OCR: An Effective OCR Approach Based on Gradual Detection of Texts |
title_fullStr | Gradual OCR: An Effective OCR Approach Based on Gradual Detection of Texts |
title_full_unstemmed | Gradual OCR: An Effective OCR Approach Based on Gradual Detection of Texts |
title_short | Gradual OCR: An Effective OCR Approach Based on Gradual Detection of Texts |
title_sort | gradual ocr an effective ocr approach based on gradual detection of texts |
topic | optical character recognition gradual OCR gradual text detection gradual low-quality filtering |
url | https://www.mdpi.com/2227-7390/11/22/4585 |
work_keys_str_mv | AT youngkipark gradualocraneffectiveocrapproachbasedongradualdetectionoftexts AT youhyunshin gradualocraneffectiveocrapproachbasedongradualdetectionoftexts |