Efficient Text Bounding Box Identification Using Mask R-CNN: Case of Thai Documents

Text detection is a fundamental task in computer vision, particularly for Optical Character Recognition (OCR) applications. This study focuses on text detection within an OCR application, encompassing text detection, text recognition, and information extraction, explicitly focusing on text detection...

Full description

Bibliographic Details
Main Authors:	Phanthakan Kiatphaisansophon, Dittaya Wanvarie, Nagul Cooharojananone
Format:	Article
Language:	English
Published:	IEEE 2024-01-01
Series:	IEEE Access
Subjects:	Deep learning text detection optical character recognition (OCR)
Online Access:	https://ieeexplore.ieee.org/document/10487930/

_version_	1797217400506023936
author	Phanthakan Kiatphaisansophon Dittaya Wanvarie Nagul Cooharojananone
author_facet	Phanthakan Kiatphaisansophon Dittaya Wanvarie Nagul Cooharojananone
author_sort	Phanthakan Kiatphaisansophon
collection	DOAJ
description	Text detection is a fundamental task in computer vision, particularly for Optical Character Recognition (OCR) applications. This study focuses on text detection within an OCR application, encompassing text detection, text recognition, and information extraction, explicitly focusing on text detection. Character-Region Awareness for Text Detection (CRAFT), Pyramid Mask Text Detector (PMTD), and Scene Text Detection with Supervised Pyramid Context Network (SPCNET) have demonstrated promising results in bounding-box detection. However, it faces challenges related to post-processing and multiline text detection. A post-processing problem arises because of the need to reconfigure the model when new documents are introduced, which leads to inefficiencies and complexities. In addition, CRAFT tends to merge bounding boxes from consecutive lines by introducing multiline errors, especially for CRAFT. To address these challenges, this study proposes an adapted approach based on Mask R-CNN, an instance segmentation model that treats each text element as an individual object. By adopting the Mask R-CNN approach, post-processing issues were successfully eliminated. Moreover, the multiline problem is effectively resolved. Comparative experiments demonstrate that the proposed model achieves results comparable to those of these models while surpassing them in accuracy and versatility. The proposed model is extensively evaluated on various document types, including bankbooks, Thai ID cards (both front and back sides), invoices, car registrations, mobile banking slips, passports, Indonesian ID cards, driver licenses, and receipts. The results indicated the model’s high performance and potential for real-world applications. Eliminating post-processing and multiline problems ensures the model’s adaptability to a wide range of document structures and reduces both time inference and resource utilization.
first_indexed	2024-04-24T12:01:15Z
format	Article
id	doaj.art-ce70e55f84394743a22db74e484aecae
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-04-24T12:01:15Z
publishDate	2024-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-ce70e55f84394743a22db74e484aecae2024-04-08T23:00:39ZengIEEEIEEE Access2169-35362024-01-0112493064932810.1109/ACCESS.2024.338391110487930Efficient Text Bounding Box Identification Using Mask R-CNN: Case of Thai DocumentsPhanthakan Kiatphaisansophon0https://orcid.org/0009-0004-2172-454XDittaya Wanvarie1https://orcid.org/0000-0002-4007-1124Nagul Cooharojananone2https://orcid.org/0000-0003-4023-5165Faculty of Science, Chulalongkorn University, Bangkok, ThailandDepartment of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University, Bangkok, ThailandDepartment of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University, Bangkok, ThailandText detection is a fundamental task in computer vision, particularly for Optical Character Recognition (OCR) applications. This study focuses on text detection within an OCR application, encompassing text detection, text recognition, and information extraction, explicitly focusing on text detection. Character-Region Awareness for Text Detection (CRAFT), Pyramid Mask Text Detector (PMTD), and Scene Text Detection with Supervised Pyramid Context Network (SPCNET) have demonstrated promising results in bounding-box detection. However, it faces challenges related to post-processing and multiline text detection. A post-processing problem arises because of the need to reconfigure the model when new documents are introduced, which leads to inefficiencies and complexities. In addition, CRAFT tends to merge bounding boxes from consecutive lines by introducing multiline errors, especially for CRAFT. To address these challenges, this study proposes an adapted approach based on Mask R-CNN, an instance segmentation model that treats each text element as an individual object. By adopting the Mask R-CNN approach, post-processing issues were successfully eliminated. Moreover, the multiline problem is effectively resolved. Comparative experiments demonstrate that the proposed model achieves results comparable to those of these models while surpassing them in accuracy and versatility. The proposed model is extensively evaluated on various document types, including bankbooks, Thai ID cards (both front and back sides), invoices, car registrations, mobile banking slips, passports, Indonesian ID cards, driver licenses, and receipts. The results indicated the model’s high performance and potential for real-world applications. Eliminating post-processing and multiline problems ensures the model’s adaptability to a wide range of document structures and reduces both time inference and resource utilization.https://ieeexplore.ieee.org/document/10487930/Deep learningtext detectionoptical character recognition (OCR)
spellingShingle	Phanthakan Kiatphaisansophon Dittaya Wanvarie Nagul Cooharojananone Efficient Text Bounding Box Identification Using Mask R-CNN: Case of Thai Documents IEEE Access Deep learning text detection optical character recognition (OCR)
title	Efficient Text Bounding Box Identification Using Mask R-CNN: Case of Thai Documents
title_full	Efficient Text Bounding Box Identification Using Mask R-CNN: Case of Thai Documents
title_fullStr	Efficient Text Bounding Box Identification Using Mask R-CNN: Case of Thai Documents
title_full_unstemmed	Efficient Text Bounding Box Identification Using Mask R-CNN: Case of Thai Documents
title_short	Efficient Text Bounding Box Identification Using Mask R-CNN: Case of Thai Documents
title_sort	efficient text bounding box identification using mask r cnn case of thai documents
topic	Deep learning text detection optical character recognition (OCR)
url	https://ieeexplore.ieee.org/document/10487930/
work_keys_str_mv	AT phanthakankiatphaisansophon efficienttextboundingboxidentificationusingmaskrcnncaseofthaidocuments AT dittayawanvarie efficienttextboundingboxidentificationusingmaskrcnncaseofthaidocuments AT nagulcooharojananone efficienttextboundingboxidentificationusingmaskrcnncaseofthaidocuments

Efficient Text Bounding Box Identification Using Mask R-CNN: Case of Thai Documents

Similar Items