Generating an Ordered Data Set from an OCR Text File
This tutorial illustrates strategies for taking raw OCR output from a scanned text, parsing it to isolate and correct essential elements of metadata, and generating an ordered data set (a python dictionary) from it. These illustrations are specific to a particular text, but the overall strategy, and...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Editorial Board of the Programming Historian
2014-11-01
|
Series: | The Programming Historian |
Subjects: | |
Online Access: | http://programminghistorian.org/lessons/generating-an-ordered-data-set-from-an-OCR-text-file |
_version_ | 1811242719172886528 |
---|---|
author | Jon Crump |
author_facet | Jon Crump |
author_sort | Jon Crump |
collection | DOAJ |
description | This tutorial illustrates strategies for taking raw OCR output from a scanned text, parsing it to isolate and correct essential elements of metadata, and generating an ordered data set (a python dictionary) from it. These illustrations are specific to a particular text, but the overall strategy, and some of the individual procedures, can be adapted to organize any scanned text, even if it doesn’t look like this one. |
first_indexed | 2024-04-12T13:56:11Z |
format | Article |
id | doaj.art-96b59da1c0c64634b587542ac7d48265 |
institution | Directory Open Access Journal |
issn | 2397-2068 |
language | English |
last_indexed | 2024-04-12T13:56:11Z |
publishDate | 2014-11-01 |
publisher | Editorial Board of the Programming Historian |
record_format | Article |
series | The Programming Historian |
spelling | doaj.art-96b59da1c0c64634b587542ac7d482652022-12-22T03:30:23ZengEditorial Board of the Programming HistorianThe Programming Historian2397-20682014-11-01Generating an Ordered Data Set from an OCR Text FileJon Crump0Freelance digital humanistThis tutorial illustrates strategies for taking raw OCR output from a scanned text, parsing it to isolate and correct essential elements of metadata, and generating an ordered data set (a python dictionary) from it. These illustrations are specific to a particular text, but the overall strategy, and some of the individual procedures, can be adapted to organize any scanned text, even if it doesn’t look like this one.http://programminghistorian.org/lessons/generating-an-ordered-data-set-from-an-OCR-text-filedata manipulationOCRPythondataset |
spellingShingle | Jon Crump Generating an Ordered Data Set from an OCR Text File The Programming Historian data manipulation OCR Python dataset |
title | Generating an Ordered Data Set from an OCR Text File |
title_full | Generating an Ordered Data Set from an OCR Text File |
title_fullStr | Generating an Ordered Data Set from an OCR Text File |
title_full_unstemmed | Generating an Ordered Data Set from an OCR Text File |
title_short | Generating an Ordered Data Set from an OCR Text File |
title_sort | generating an ordered data set from an ocr text file |
topic | data manipulation OCR Python dataset |
url | http://programminghistorian.org/lessons/generating-an-ordered-data-set-from-an-OCR-text-file |
work_keys_str_mv | AT joncrump generatinganordereddatasetfromanocrtextfile |