Handwritten Paragraph Recognition Using Spatial Information on Russian Notebooks Dataset

Handwritten paragraph recognition is a vital aspect of handwritten document analysis, enhancing accuracy and usability across various applications. However, recognizing paragraphs in handwritten documents is challenging due to layout variations and irregularities. Spatial information, encompassing s...

Full description

Bibliographic Details
Main Authors: Samah - Mohammed, Nikolay N Teslya
Format: Article
Language:English
Published: FRUCT 2023-11-01
Series:Proceedings of the XXth Conference of Open Innovations Association FRUCT
Subjects:
Online Access:https://www.fruct.org/publications/volume-34/fruct34/files/Moh.pdf
_version_ 1797354978611822592
author Samah - Mohammed
Nikolay N Teslya
author_facet Samah - Mohammed
Nikolay N Teslya
author_sort Samah - Mohammed
collection DOAJ
description Handwritten paragraph recognition is a vital aspect of handwritten document analysis, enhancing accuracy and usability across various applications. However, recognizing paragraphs in handwritten documents is challenging due to layout variations and irregularities. Spatial information, encompassing spatial relationships between text elements, is essential for accurate paragraph segmentation and document comprehension. Recent works in handwritten Russian recognition have primarily focused on character and line-level recognition. This study is the first attempt on paragraph-level recognition for Russian handwriting, utilizing the Vertical Attention Network (VAN) with a hybrid attention method. Key contributions include the preparation of a unique Russian dataset at the paragraph level, containing around 2600 images with PAGE XML-encoded ground truth. The VAN model was fine-tuned for whole paragraph recognition, and comprehensive experiments were conducted, comparing its performance against alternative non-layout-aware approaches. This work advances layout-aware recognition in handwritten Russian documents, addressing an unexplored area in the field.
first_indexed 2024-03-08T13:56:35Z
format Article
id doaj.art-e830f47178e64439a4db802e68528dfb
institution Directory Open Access Journal
issn 2305-7254
2343-0737
language English
last_indexed 2024-03-08T13:56:35Z
publishDate 2023-11-01
publisher FRUCT
record_format Article
series Proceedings of the XXth Conference of Open Innovations Association FRUCT
spelling doaj.art-e830f47178e64439a4db802e68528dfb2024-01-15T12:32:23ZengFRUCTProceedings of the XXth Conference of Open Innovations Association FRUCT2305-72542343-07372023-11-01341113https://youtu.be/wmBpIKJHYZc10.23919/FRUCT60429.2023.10328173Handwritten Paragraph Recognition Using Spatial Information on Russian Notebooks DatasetSamah - Mohammed0Nikolay N Teslya1ITMO UniversitySPC RASHandwritten paragraph recognition is a vital aspect of handwritten document analysis, enhancing accuracy and usability across various applications. However, recognizing paragraphs in handwritten documents is challenging due to layout variations and irregularities. Spatial information, encompassing spatial relationships between text elements, is essential for accurate paragraph segmentation and document comprehension. Recent works in handwritten Russian recognition have primarily focused on character and line-level recognition. This study is the first attempt on paragraph-level recognition for Russian handwriting, utilizing the Vertical Attention Network (VAN) with a hybrid attention method. Key contributions include the preparation of a unique Russian dataset at the paragraph level, containing around 2600 images with PAGE XML-encoded ground truth. The VAN model was fine-tuned for whole paragraph recognition, and comprehensive experiments were conducted, comparing its performance against alternative non-layout-aware approaches. This work advances layout-aware recognition in handwritten Russian documents, addressing an unexplored area in the field.https://www.fruct.org/publications/volume-34/fruct34/files/Moh.pdfhandwritten paragraph recognitionspatial informationrussian notebooks dataset.
spellingShingle Samah - Mohammed
Nikolay N Teslya
Handwritten Paragraph Recognition Using Spatial Information on Russian Notebooks Dataset
Proceedings of the XXth Conference of Open Innovations Association FRUCT
handwritten paragraph recognition
spatial information
russian notebooks dataset.
title Handwritten Paragraph Recognition Using Spatial Information on Russian Notebooks Dataset
title_full Handwritten Paragraph Recognition Using Spatial Information on Russian Notebooks Dataset
title_fullStr Handwritten Paragraph Recognition Using Spatial Information on Russian Notebooks Dataset
title_full_unstemmed Handwritten Paragraph Recognition Using Spatial Information on Russian Notebooks Dataset
title_short Handwritten Paragraph Recognition Using Spatial Information on Russian Notebooks Dataset
title_sort handwritten paragraph recognition using spatial information on russian notebooks dataset
topic handwritten paragraph recognition
spatial information
russian notebooks dataset.
url https://www.fruct.org/publications/volume-34/fruct34/files/Moh.pdf
work_keys_str_mv AT samahmohammed handwrittenparagraphrecognitionusingspatialinformationonrussiannotebooksdataset
AT nikolaynteslya handwrittenparagraphrecognitionusingspatialinformationonrussiannotebooksdataset