Semantic Text Segmentation from Synthetic Images of Full-Text Documents
An algorithm (divided into multiple modules) for generating images of full-text documents is presented. These images can be used to train, test, and evaluate models for Optical Character Recognition (OCR). The algorithm is modular, individual parts can be changed and tweaked to generate desired i...
Main Authors: | Lukáš Bureš, Ivan Gruber, Petr Neduchal, Miroslav Hlaváč, Marek Hrúz |
---|---|
Format: | Article |
Language: | English |
Published: |
Russian Academy of Sciences, St. Petersburg Federal Research Center
2019-12-01
|
Series: | Информатика и автоматизация |
Subjects: | |
Online Access: | http://ia.spcras.ru/index.php/sp/article/view/4527 |
Similar Items
-
Efficient Text Bounding Box Identification Using Mask R-CNN: Case of Thai Documents
by: Phanthakan Kiatphaisansophon, et al.
Published: (2024-01-01) -
Basic Test Framework for the Evaluation of Text Line Segmentation and Text Parameter Extraction
by: Darko Brodić, et al.
Published: (2010-05-01) -
Utilization of OCR and text feature extraction to create a database of labour complaints
by: Yan Puspitarani, et al.
Published: (2020-08-01) -
The Giles Ecosystem – Storage, Text Extraction, and OCR of Documents
by: Julia Damerow, et al.
Published: (2017-09-01) -
Analysis of Recent Deep Learning Techniques for Arabic Handwritten-Text OCR and Post-OCR Correction
by: Rayyan Najam, et al.
Published: (2023-06-01)