Gradual OCR: An Effective OCR Approach Based on Gradual Detection of Texts

In this paper, we present a novel approach to optical character recognition that incorporates various supplementary techniques, including the gradual detection of texts and gradual filtering of inaccurately recognized texts. To minimize false negatives, we attempt to detect all text by incrementally...

Full description

Bibliographic Details
Main Authors: Youngki Park, Youhyun Shin
Format: Article
Language:English
Published: MDPI AG 2023-11-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/11/22/4585
_version_ 1797458497532592128
author Youngki Park
Youhyun Shin
author_facet Youngki Park
Youhyun Shin
author_sort Youngki Park
collection DOAJ
description In this paper, we present a novel approach to optical character recognition that incorporates various supplementary techniques, including the gradual detection of texts and gradual filtering of inaccurately recognized texts. To minimize false negatives, we attempt to detect all text by incrementally lowering the relevant thresholds. To mitigate false positives, we implement a novel filtering method that dynamically adjusts based on the confidence levels of recognized texts and their corresponding detection thresholds. Additionally, we use straightforward yet effective strategies to enhance the optical character recognition accuracy and speed, such as upscaling, link refinement, perspective transformation, the merging of cropped images, and simple autoregression. Given our focus on Korean chart data, we compile a mix of real-world and artificial Korean chart datasets for experimentation. Our experimental results show that our approach outperforms Tesseract by approximately 7 to 15 times and EasyOCR by 3 to 5 times in accuracy, as measured using a Jaccard similarity-based error rate on our datasets.
first_indexed 2024-03-09T16:37:57Z
format Article
id doaj.art-30ce04eb7f9e438ba1e018e12a5eb7d6
institution Directory Open Access Journal
issn 2227-7390
language English
last_indexed 2024-03-09T16:37:57Z
publishDate 2023-11-01
publisher MDPI AG
record_format Article
series Mathematics
spelling doaj.art-30ce04eb7f9e438ba1e018e12a5eb7d62023-11-24T14:54:05ZengMDPI AGMathematics2227-73902023-11-011122458510.3390/math11224585Gradual OCR: An Effective OCR Approach Based on Gradual Detection of TextsYoungki Park0Youhyun Shin1Department of Computer Education, Chuncheon National University of Education, Chuncheon 24328, Republic of KoreaDepartment of Computer Science and Engineering, Incheon National University, Incheon 22012, Republic of KoreaIn this paper, we present a novel approach to optical character recognition that incorporates various supplementary techniques, including the gradual detection of texts and gradual filtering of inaccurately recognized texts. To minimize false negatives, we attempt to detect all text by incrementally lowering the relevant thresholds. To mitigate false positives, we implement a novel filtering method that dynamically adjusts based on the confidence levels of recognized texts and their corresponding detection thresholds. Additionally, we use straightforward yet effective strategies to enhance the optical character recognition accuracy and speed, such as upscaling, link refinement, perspective transformation, the merging of cropped images, and simple autoregression. Given our focus on Korean chart data, we compile a mix of real-world and artificial Korean chart datasets for experimentation. Our experimental results show that our approach outperforms Tesseract by approximately 7 to 15 times and EasyOCR by 3 to 5 times in accuracy, as measured using a Jaccard similarity-based error rate on our datasets.https://www.mdpi.com/2227-7390/11/22/4585optical character recognitiongradual OCRgradual text detectiongradual low-quality filtering
spellingShingle Youngki Park
Youhyun Shin
Gradual OCR: An Effective OCR Approach Based on Gradual Detection of Texts
Mathematics
optical character recognition
gradual OCR
gradual text detection
gradual low-quality filtering
title Gradual OCR: An Effective OCR Approach Based on Gradual Detection of Texts
title_full Gradual OCR: An Effective OCR Approach Based on Gradual Detection of Texts
title_fullStr Gradual OCR: An Effective OCR Approach Based on Gradual Detection of Texts
title_full_unstemmed Gradual OCR: An Effective OCR Approach Based on Gradual Detection of Texts
title_short Gradual OCR: An Effective OCR Approach Based on Gradual Detection of Texts
title_sort gradual ocr an effective ocr approach based on gradual detection of texts
topic optical character recognition
gradual OCR
gradual text detection
gradual low-quality filtering
url https://www.mdpi.com/2227-7390/11/22/4585
work_keys_str_mv AT youngkipark gradualocraneffectiveocrapproachbasedongradualdetectionoftexts
AT youhyunshin gradualocraneffectiveocrapproachbasedongradualdetectionoftexts