Semantic Text Segmentation from Synthetic Images of Full-Text Documents

An algorithm (divided into multiple modules) for generating images of full-text documents is presented. These images can be used to train, test, and evaluate models for Optical Character Recognition (OCR). The algorithm is modular, individual parts can be changed and tweaked to generate desired i...

Full description

Bibliographic Details
Main Authors: Lukáš Bureš, Ivan Gruber, Petr Neduchal, Miroslav Hlaváč, Marek Hrúz
Format: Article
Language:English
Published: Russian Academy of Sciences, St. Petersburg Federal Research Center 2019-12-01
Series:Информатика и автоматизация
Subjects:
Online Access:http://ia.spcras.ru/index.php/sp/article/view/4527