Neural OCR Post-Hoc Correction of Historical Corpora
AbstractOptical character recognition (OCR) is crucial for a deeper access to historical collections. OCR needs to account for orthographic variations, typefaces, or language evolution (i.e., new letters, word spellings), as the main source of character, word, or word segmentation tr...
Main Authors: | Lijun Lyu, Maria Koutraki, Martin Krickl, Besnik Fetahu |
---|---|
Format: | Article |
Language: | English |
Published: |
The MIT Press
2021-01-01
|
Series: | Transactions of the Association for Computational Linguistics |
Online Access: | https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00379/100788/Neural-OCR-Post-Hoc-Correction-of-Historical |
Similar Items
-
Analysis of Recent Deep Learning Techniques for Arabic Handwritten-Text OCR and Post-OCR Correction
by: Rayyan Najam, et al.
Published: (2023-06-01) -
An Efficient Unsupervised Approach for OCR Error Correction of Vietnamese OCR Text
by: Quoc-Dung Nguyen, et al.
Published: (2023-01-01) -
Lexically Aware Semi-Supervised Learning for OCR Post-Correction
by: Shruti Rijhwani, et al.
Published: (2021-01-01) -
Corpora and historical linguistics Corpora e linguística histórica
by: Merja Kytö
Published: (2011-01-01) -
Ground Truth OCR Sample Data of Finnish Historical Newspapers and Journals in Data Improvement Validation of a re-OCRing Process
by: Kimmo Kettunen, et al.
Published: (2020-02-01)