A Standardized Project Gutenberg Corpus for Statistical Analysis of Natural Language and Quantitative Linguistics

The use of Project Gutenberg (PG) as a text corpus has been extremely popular in statistical analysis of language for more than 25 years. However, in contrast to other major linguistic datasets of similar importance, no consensual full version of PG exists to date. In fact, most PG studies so far ei...

Full description

Bibliographic Details
Main Authors: Martin Gerlach, Francesc Font-Clos
Format: Article
Language:English
Published: MDPI AG 2020-01-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/22/1/126