The Giles Ecosystem – Storage, Text Extraction, and OCR of Documents
In the digital humanities, there is a constant need to turn images and PDF files into plain text to apply analyses such as topic modelling, named entity recognition, and other techniques. However, although there exist different solutions to extract text embedded in PDF files or run OCR on images, th...
Main Authors: | Julia Damerow, B. R. Erick Peirson, Manfred D. Laubichler |
---|---|
Format: | Article |
Language: | English |
Published: |
Ubiquity Press
2017-09-01
|
Series: | Journal of Open Research Software |
Subjects: | |
Online Access: | https://openresearchsoftware.metajnl.com/articles/164 |
Similar Items
-
Utilization of OCR and text feature extraction to create a database of labour complaints
by: Yan Puspitarani, et al.
Published: (2020-08-01) -
Analysis of Recent Deep Learning Techniques for Arabic Handwritten-Text OCR and Post-OCR Correction
by: Rayyan Najam, et al.
Published: (2023-06-01) -
A New Big Data Processing Framework for the Online Roadshow
by: Kang-Ren Leow, et al.
Published: (2023-06-01) -
Gradual OCR: An Effective OCR Approach Based on Gradual Detection of Texts
by: Youngki Park, et al.
Published: (2023-11-01) -
Basic Test Framework for the Evaluation of Text Line Segmentation and Text Parameter Extraction
by: Darko Brodić, et al.
Published: (2010-05-01)