Lexically Aware Semi-Supervised Learning for OCR Post-Correction

AbstractMuch of the existing linguistic data in many languages of the world is locked away in non- digitized books and documents. Optical character recognition (OCR) can be used to produce digitized text, and previous work has demonstrated the utility of neural post-correction method...

Full description

Bibliographic Details
Main Authors: Shruti Rijhwani, Daisy Rosenblum, Antonios Anastasopoulos, Graham Neubig
Format: Article
Language:English
Published: The MIT Press 2021-01-01
Series:Transactions of the Association for Computational Linguistics
Online Access:https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00427/108475/Lexically-Aware-Semi-Supervised-Learning-for-OCR