A Data Set of Final Year High School Examination Texts of South African Home and First Additional Language Subjects
This article describes a data set of reading comprehension and summary writing texts that were used in final-year high school examinations in South Africa between 2008 and 2020. It contains texts for eleven official South African languages. PDF versions of the texts stem from South Africa’s Departme...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Ubiquity Press
2023-07-01
|
Series: | Journal of Open Humanities Data |
Subjects: | |
Online Access: | https://account.openhumanitiesdata.metajnl.com/index.php/up-j-johd/article/view/108 |
_version_ | 1797748824125472768 |
---|---|
author | Johannes Sibeko Menno van Zaanen |
author_facet | Johannes Sibeko Menno van Zaanen |
author_sort | Johannes Sibeko |
collection | DOAJ |
description | This article describes a data set of reading comprehension and summary writing texts that were used in final-year high school examinations in South Africa between 2008 and 2020. It contains texts for eleven official South African languages. PDF versions of the texts stem from South Africa’s Department of Basic Education’s online public access repository. Plain text is extracted from the PDFs and the texts are tokenized. The data set contains 429 full-text files with 929 manually extracted comprehension and summary writing texts. The data is useful for studies investigating, e.g., linguistic properties, text readability, text properties, and linguistic complexity in any of the eleven languages. Furthermore, both intra-language and inter-language comparisons or investigations can be made. |
first_indexed | 2024-03-12T16:10:25Z |
format | Article |
id | doaj.art-5bf060e707a24bfdbece838e216442c4 |
institution | Directory Open Access Journal |
issn | 2059-481X |
language | English |
last_indexed | 2024-03-12T16:10:25Z |
publishDate | 2023-07-01 |
publisher | Ubiquity Press |
record_format | Article |
series | Journal of Open Humanities Data |
spelling | doaj.art-5bf060e707a24bfdbece838e216442c42023-08-09T13:59:18ZengUbiquity PressJournal of Open Humanities Data2059-481X2023-07-0199910.5334/johd.108108A Data Set of Final Year High School Examination Texts of South African Home and First Additional Language SubjectsJohannes Sibeko0https://orcid.org/0000-0003-3586-7491Menno van Zaanen1https://orcid.org/0000-0003-1841-2444Linguistics and Applied Linguistics, Nelson Mandela University, GqeberhaSouth African Centre for Digital Language Resources, North-West University, PotchefstroomThis article describes a data set of reading comprehension and summary writing texts that were used in final-year high school examinations in South Africa between 2008 and 2020. It contains texts for eleven official South African languages. PDF versions of the texts stem from South Africa’s Department of Basic Education’s online public access repository. Plain text is extracted from the PDFs and the texts are tokenized. The data set contains 429 full-text files with 929 manually extracted comprehension and summary writing texts. The data is useful for studies investigating, e.g., linguistic properties, text readability, text properties, and linguistic complexity in any of the eleven languages. Furthermore, both intra-language and inter-language comparisons or investigations can be made.https://account.openhumanitiesdata.metajnl.com/index.php/up-j-johd/article/view/108linguistic corpusindigenous languagesexamination textsreading comprehensionsummary writingfinal year high school |
spellingShingle | Johannes Sibeko Menno van Zaanen A Data Set of Final Year High School Examination Texts of South African Home and First Additional Language Subjects Journal of Open Humanities Data linguistic corpus indigenous languages examination texts reading comprehension summary writing final year high school |
title | A Data Set of Final Year High School Examination Texts of South African Home and First Additional Language Subjects |
title_full | A Data Set of Final Year High School Examination Texts of South African Home and First Additional Language Subjects |
title_fullStr | A Data Set of Final Year High School Examination Texts of South African Home and First Additional Language Subjects |
title_full_unstemmed | A Data Set of Final Year High School Examination Texts of South African Home and First Additional Language Subjects |
title_short | A Data Set of Final Year High School Examination Texts of South African Home and First Additional Language Subjects |
title_sort | data set of final year high school examination texts of south african home and first additional language subjects |
topic | linguistic corpus indigenous languages examination texts reading comprehension summary writing final year high school |
url | https://account.openhumanitiesdata.metajnl.com/index.php/up-j-johd/article/view/108 |
work_keys_str_mv | AT johannessibeko adatasetoffinalyearhighschoolexaminationtextsofsouthafricanhomeandfirstadditionallanguagesubjects AT mennovanzaanen adatasetoffinalyearhighschoolexaminationtextsofsouthafricanhomeandfirstadditionallanguagesubjects AT johannessibeko datasetoffinalyearhighschoolexaminationtextsofsouthafricanhomeandfirstadditionallanguagesubjects AT mennovanzaanen datasetoffinalyearhighschoolexaminationtextsofsouthafricanhomeandfirstadditionallanguagesubjects |