The Giles Ecosystem – Storage, Text Extraction, and OCR of Documents
In the digital humanities, there is a constant need to turn images and PDF files into plain text to apply analyses such as topic modelling, named entity recognition, and other techniques. However, although there exist different solutions to extract text embedded in PDF files or run OCR on images, th...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Ubiquity Press
2017-09-01
|
Series: | Journal of Open Research Software |
Subjects: | |
Online Access: | https://openresearchsoftware.metajnl.com/articles/164 |