Text Recognition for Nepalese Manuscripts in Pracalit Script

This dataset is a model for handwritten text recognition (HTR) of Sanskrit and Newar Nepalese manuscripts in Pracalit script. This paper introduces the state of the field in Newar literature, Newar manuscripts, and HTR engines. It explains our methodology for developing the requisite ground truth co...

Full description

Bibliographic Details
Main Authors: Alexander James O’Neill, Nathan Hill
Format: Article
Language:English
Published: Ubiquity Press 2022-11-01
Series:Journal of Open Humanities Data
Subjects:
Online Access:https://openhumanitiesdata.metajnl.com/articles/90
_version_ 1811178214431653888
author Alexander James O’Neill
Nathan Hill
author_facet Alexander James O’Neill
Nathan Hill
author_sort Alexander James O’Neill
collection DOAJ
description This dataset is a model for handwritten text recognition (HTR) of Sanskrit and Newar Nepalese manuscripts in Pracalit script. This paper introduces the state of the field in Newar literature, Newar manuscripts, and HTR engines. It explains our methodology for developing the requisite ground truth consisting of manuscript images and corresponding transcriptions, training our model with a PyLAia engine, and this model’s limitations. This dataset shared on Zenodo can be used by anyone working with manuscripts in Pracalit script, which will benefit the fields of Indology and Newar studies, as well as historical and linguistic analysis.
first_indexed 2024-04-11T06:14:42Z
format Article
id doaj.art-e9a271569e0245f696b333972c476a12
institution Directory Open Access Journal
issn 2059-481X
language English
last_indexed 2024-04-11T06:14:42Z
publishDate 2022-11-01
publisher Ubiquity Press
record_format Article
series Journal of Open Humanities Data
spelling doaj.art-e9a271569e0245f696b333972c476a122022-12-22T04:41:06ZengUbiquity PressJournal of Open Humanities Data2059-481X2022-11-01810.5334/johd.9077Text Recognition for Nepalese Manuscripts in Pracalit ScriptAlexander James O’Neill0Nathan Hill1Department of East Asian Languages and Cultures, SOAS University of London, LondonDepartment of East Asian Languages and Cultures, SOAS University of London, London, UK; Trinity Centre for Asian Studies, Trinity College Dublin, DublinThis dataset is a model for handwritten text recognition (HTR) of Sanskrit and Newar Nepalese manuscripts in Pracalit script. This paper introduces the state of the field in Newar literature, Newar manuscripts, and HTR engines. It explains our methodology for developing the requisite ground truth consisting of manuscript images and corresponding transcriptions, training our model with a PyLAia engine, and this model’s limitations. This dataset shared on Zenodo can be used by anyone working with manuscripts in Pracalit script, which will benefit the fields of Indology and Newar studies, as well as historical and linguistic analysis.https://openhumanitiesdata.metajnl.com/articles/90handwritten text recognitionpylaiatranskribussanskritnewarmanuscripts
spellingShingle Alexander James O’Neill
Nathan Hill
Text Recognition for Nepalese Manuscripts in Pracalit Script
Journal of Open Humanities Data
handwritten text recognition
pylaia
transkribus
sanskrit
newar
manuscripts
title Text Recognition for Nepalese Manuscripts in Pracalit Script
title_full Text Recognition for Nepalese Manuscripts in Pracalit Script
title_fullStr Text Recognition for Nepalese Manuscripts in Pracalit Script
title_full_unstemmed Text Recognition for Nepalese Manuscripts in Pracalit Script
title_short Text Recognition for Nepalese Manuscripts in Pracalit Script
title_sort text recognition for nepalese manuscripts in pracalit script
topic handwritten text recognition
pylaia
transkribus
sanskrit
newar
manuscripts
url https://openhumanitiesdata.metajnl.com/articles/90
work_keys_str_mv AT alexanderjamesoneill textrecognitionfornepalesemanuscriptsinpracalitscript
AT nathanhill textrecognitionfornepalesemanuscriptsinpracalitscript