The TRANSCOMP Dataset of Literary Translations from 120 Languages and a Parallel Collection of English-language Originals

The TRANSCOMP Dataset of Literary Translations is a collection of document-level word frequencies sampled from 10,631 translations into English of global literary fiction published since 1950, together with a historically matched parallel corpus of 10,682 fictional works originally published in Engl...

Full description

Bibliographic Details
Main Authors: Matt Erlin, Andrew Piper, Douglas Knox, Stephen Pentecost, Allie Blank
Format: Article
Language:English
Published: Ubiquity Press 2022-12-01
Series:Journal of Open Humanities Data
Subjects:
Online Access:https://openhumanitiesdata.metajnl.com/articles/94
_version_ 1797960148910604288
author Matt Erlin
Andrew Piper
Douglas Knox
Stephen Pentecost
Allie Blank
author_facet Matt Erlin
Andrew Piper
Douglas Knox
Stephen Pentecost
Allie Blank
author_sort Matt Erlin
collection DOAJ
description The TRANSCOMP Dataset of Literary Translations is a collection of document-level word frequencies sampled from 10,631 translations into English of global literary fiction published since 1950, together with a historically matched parallel corpus of 10,682 fictional works originally published in English. We provide CSV files with word frequency counts for 10,000-word samples taken from each text. The associated metadata is available in a separate CSV. These data will be useful to literary scholars and linguists working in translation studies, and those interested in the linguistic, stylistic, and thematic specificity of translations from particular regions.
first_indexed 2024-04-11T00:40:46Z
format Article
id doaj.art-d370f45f60a242a2bfa3e761cfb3d31b
institution Directory Open Access Journal
issn 2059-481X
language English
last_indexed 2024-04-11T00:40:46Z
publishDate 2022-12-01
publisher Ubiquity Press
record_format Article
series Journal of Open Humanities Data
spelling doaj.art-d370f45f60a242a2bfa3e761cfb3d31b2023-01-06T06:32:56ZengUbiquity PressJournal of Open Humanities Data2059-481X2022-12-01810.5334/johd.9480The TRANSCOMP Dataset of Literary Translations from 120 Languages and a Parallel Collection of English-language OriginalsMatt Erlin0Andrew Piper1Douglas Knox2Stephen Pentecost3Allie Blank4Germanic Languages and Literatures, Washington University, St. LouisLanguages, Literatures, and Cultures, McGill University, MontrealHumanities Digital Workshop, Washington University, St. LouisHumanities Digital Workshop, Washington University, St. LouisHumanities Digital Workshop, Washington University, St. LouisThe TRANSCOMP Dataset of Literary Translations is a collection of document-level word frequencies sampled from 10,631 translations into English of global literary fiction published since 1950, together with a historically matched parallel corpus of 10,682 fictional works originally published in English. We provide CSV files with word frequency counts for 10,000-word samples taken from each text. The associated metadata is available in a separate CSV. These data will be useful to literary scholars and linguists working in translation studies, and those interested in the linguistic, stylistic, and thematic specificity of translations from particular regions.https://openhumanitiesdata.metajnl.com/articles/94translation studiescomputational literary studiesworld literaturenatural language processingtext corpustext collection
spellingShingle Matt Erlin
Andrew Piper
Douglas Knox
Stephen Pentecost
Allie Blank
The TRANSCOMP Dataset of Literary Translations from 120 Languages and a Parallel Collection of English-language Originals
Journal of Open Humanities Data
translation studies
computational literary studies
world literature
natural language processing
text corpus
text collection
title The TRANSCOMP Dataset of Literary Translations from 120 Languages and a Parallel Collection of English-language Originals
title_full The TRANSCOMP Dataset of Literary Translations from 120 Languages and a Parallel Collection of English-language Originals
title_fullStr The TRANSCOMP Dataset of Literary Translations from 120 Languages and a Parallel Collection of English-language Originals
title_full_unstemmed The TRANSCOMP Dataset of Literary Translations from 120 Languages and a Parallel Collection of English-language Originals
title_short The TRANSCOMP Dataset of Literary Translations from 120 Languages and a Parallel Collection of English-language Originals
title_sort transcomp dataset of literary translations from 120 languages and a parallel collection of english language originals
topic translation studies
computational literary studies
world literature
natural language processing
text corpus
text collection
url https://openhumanitiesdata.metajnl.com/articles/94
work_keys_str_mv AT matterlin thetranscompdatasetofliterarytranslationsfrom120languagesandaparallelcollectionofenglishlanguageoriginals
AT andrewpiper thetranscompdatasetofliterarytranslationsfrom120languagesandaparallelcollectionofenglishlanguageoriginals
AT douglasknox thetranscompdatasetofliterarytranslationsfrom120languagesandaparallelcollectionofenglishlanguageoriginals
AT stephenpentecost thetranscompdatasetofliterarytranslationsfrom120languagesandaparallelcollectionofenglishlanguageoriginals
AT allieblank thetranscompdatasetofliterarytranslationsfrom120languagesandaparallelcollectionofenglishlanguageoriginals
AT matterlin transcompdatasetofliterarytranslationsfrom120languagesandaparallelcollectionofenglishlanguageoriginals
AT andrewpiper transcompdatasetofliterarytranslationsfrom120languagesandaparallelcollectionofenglishlanguageoriginals
AT douglasknox transcompdatasetofliterarytranslationsfrom120languagesandaparallelcollectionofenglishlanguageoriginals
AT stephenpentecost transcompdatasetofliterarytranslationsfrom120languagesandaparallelcollectionofenglishlanguageoriginals
AT allieblank transcompdatasetofliterarytranslationsfrom120languagesandaparallelcollectionofenglishlanguageoriginals