HATHI 1M: Introducing a Million Page Historical Prose Dataset in English from the Hathi Trust

We present a new dataset built on prior work consisting of 1,671,370 randomly sampled pages of English-language prose roughly divided between modes of fictional and non-fictional writing and published between the years 1800 and 2000. In addition to focusing on the “page’’ as the basic bibliographic...

Full description

Bibliographic Details
Main Authors: Sunyam Bagga, Andrew Piper
Format: Article
Language:English
Published: Ubiquity Press 2022-03-01
Series:Journal of Open Humanities Data
Subjects:
Online Access:https://openhumanitiesdata.metajnl.com/articles/71