HATHI 1M: Introducing a Million Page Historical Prose Dataset in English from the Hathi Trust
We present a new dataset built on prior work consisting of 1,671,370 randomly sampled pages of English-language prose roughly divided between modes of fictional and non-fictional writing and published between the years 1800 and 2000. In addition to focusing on the “page’’ as the basic bibliographic...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Ubiquity Press
2022-03-01
|
Series: | Journal of Open Humanities Data |
Subjects: | |
Online Access: | https://openhumanitiesdata.metajnl.com/articles/71 |