Semantic Text Segmentation from Synthetic Images of Full-Text Documents

Semantic Text Segmentation from Synthetic Images of Full-Text Documents

An algorithm (divided into multiple modules) for generating images of full-text documents is presented. These images can be used to train, test, and evaluate models for Optical Character Recognition (OCR). The algorithm is modular, individual parts can be changed and tweaked to generate desired i...

Full description

Bibliographic Details
Main Authors:	Lukáš Bureš, Ivan Gruber, Petr Neduchal, Miroslav Hlaváč, Marek Hrúz
Format:	Article
Language:	English
Published:	Russian Academy of Sciences, St. Petersburg Federal Research Center 2019-12-01
Series:	Информатика и автоматизация
Subjects:	generation of synthetic images semantic text segmentation variational autoencoder vae optical character recognition ocr aged-looking text generation.
Online Access:	http://ia.spcras.ru/index.php/sp/article/view/4527

Similar Items

Efficient Text Bounding Box Identification Using Mask R-CNN: Case of Thai Documents
by: Phanthakan Kiatphaisansophon, et al.
Published: (2024-01-01)

Basic Test Framework for the Evaluation of Text Line Segmentation and Text Parameter Extraction
by: Darko Brodić, et al.
Published: (2010-05-01)

Utilization of OCR and text feature extraction to create a database of labour complaints
by: Yan Puspitarani, et al.
Published: (2020-08-01)

The Giles Ecosystem – Storage, Text Extraction, and OCR of Documents
by: Julia Damerow, et al.
Published: (2017-09-01)

Analysis of Recent Deep Learning Techniques for Arabic Handwritten-Text OCR and Post-OCR Correction
by: Rayyan Najam, et al.
Published: (2023-06-01)

Improving Scene Text Recognition for Indian Languages with Transfer Learning and Font Diversity
by: Sanjana Gunna, et al.
Published: (2022-03-01)

Gradual OCR: An Effective OCR Approach Based on Gradual Detection of Texts
by: Youngki Park, et al.
Published: (2023-11-01)

Survey of Deep Learning Table-to-Text Generation
by: HU Kang, XI Xuefeng, CUI Zhiming, ZHOU Yueyao, QIU Yajin
Published: (2022-11-01)

A comparison of deep transfer learning backbone architecture techniques for printed text detection of different font styles from unstructured documents
by: Supriya Mahadevkar, et al.
Published: (2024-02-01)

Handwritten Character Recognition to Obtain Editable Text
by: Pravalika Jella, et al.
Published: (2023-01-01)

Content Order-Controllable MR-to-Text
by: Keisuke Toyama, et al.
Published: (2023-01-01)

Controllable Text Generation Using Semantic Control Grammar
by: Hyein Seo, et al.
Published: (2023-01-01)

Testing of detection tools for AI-generated text
by: Debora Weber-Wulff, et al.
Published: (2023-12-01)

Diffusion models in text generation: a survey
by: Qiuhua Yi, et al.
Published: (2024-02-01)

Novel Linguistic Steganography Based on Character-Level Text Generation
by: Lingyun Xiang, et al.
Published: (2020-09-01)

Biomedical semantic text summarizer
by: Mahira Kirmani, et al.
Published: (2024-04-01)

Text2Human: text-driven controllable human image generation
by: Jiang, Yuming, et al.
Published: (2022)

Termination as the Basis for Classification of Document Texts
by: Marina V. Kosova, et al.
Published: (2017-12-01)

Digital Texts in Practice
by: Christian Wittern
Published: (2020-10-01)

A Novel Approach for Semantic Extractive Text Summarization
by: Waseemullah, et al.
Published: (2022-04-01)

Transforming the generative pretrained transformer into augmented business text writer
by: Faisal Khalil, et al.
Published: (2022-11-01)

A Transformer-Based Hierarchical Variational AutoEncoder Combined Hidden Markov Model for Long Text Generation
by: Kun Zhao, et al.
Published: (2021-09-01)

Feature-aware conditional GAN for category text generation
by: Li, Xinze, et al.
Published: (2023)

Beyond Lexical Boundaries: LLM-Generated Text Detection for Romanian Digital Libraries
by: Melania Nitu, et al.
Published: (2024-01-01)

The method of presenting the counter-text: main the interpretive strategy of schoolchildren
Published: (2013-04-01)

HierTTS: Expressive End-to-End Text-to-Waveform Using a Multi-Scale Hierarchical Variational Auto-Encoder
by: Zengqiang Shang, et al.
Published: (2023-01-01)

To Studying of Syntactic Subjects: Text as Stylistic Unity
Published: (2013-04-01)

Comparative Evaluation of VAEs, VAE-GANs and AAEs for Anomaly Detection in Network Intrusion Data
by: Mahmoud Mohamed
Published: (2023-12-01)

TextControlGAN: Text-to-Image Synthesis with Controllable Generative Adversarial Networks
by: Hyeeun Ku, et al.
Published: (2023-04-01)

Hybridization of Intelligent Solutions Architecture for Text Understanding and Text Generation
by: Anton Ivaschenko, et al.
Published: (2021-06-01)

Application of Generative Adversarial Networks and Shapley Algorithm Based on Easy Data Augmentation for Imbalanced Text Data
by: Jheng-Long Wu, et al.
Published: (2022-10-01)

A DEEP AUTOENCODER-BASED REPRESENTATION FOR ARABIC TEXT CATEGORIZATION
by: Fatima-zahra El-Alami, et al.
Published: (2020-06-01)

Intent-Controllable Citation Text Generation
by: Shing-Yun Jung, et al.
Published: (2022-05-01)

Formal and Semantic Structure of Contexts of Text Clip AND THEN
by: E. S. Sheremetyeva, et al.
Published: (2022-05-01)

Leveraging the potential of synthetic text for AI in mental healthcare
by: Julia Ive
Published: (2022-10-01)

CTGGAN: Controllable Text Generation with Generative Adversarial Network
by: Zhe Yang, et al.
Published: (2024-04-01)

Development and Evaluation of Emotional Conversation System based on Automated Text Generation
by: Te-Lun Yang, et al.
Published: (2020-11-01)

Sentence-level heuristic tree search for long text generation
by: Zheng Chen, et al.
Published: (2023-09-01)

Data pre-processing to increase the quality of optical text recognition systems
by: Konstantin Dergachov, et al.
Published: (2021-11-01)

TextNetTopics Pro, a topic model-based text classification for short text by integration of semantic and document-topic distribution information
by: Daniel Voskergian, et al.
Published: (2023-10-01)