A Data Set of Final Year High School Examination Texts of South African Home and First Additional Language Subjects

This article describes a data set of reading comprehension and summary writing texts that were used in final-year high school examinations in South Africa between 2008 and 2020. It contains texts for eleven official South African languages. PDF versions of the texts stem from South Africa’s Departme...

Full description

Bibliographic Details
Main Authors: Johannes Sibeko, Menno van Zaanen
Format: Article
Language:English
Published: Ubiquity Press 2023-07-01
Series:Journal of Open Humanities Data
Subjects:
Online Access:https://account.openhumanitiesdata.metajnl.com/index.php/up-j-johd/article/view/108
_version_ 1797748824125472768
author Johannes Sibeko
Menno van Zaanen
author_facet Johannes Sibeko
Menno van Zaanen
author_sort Johannes Sibeko
collection DOAJ
description This article describes a data set of reading comprehension and summary writing texts that were used in final-year high school examinations in South Africa between 2008 and 2020. It contains texts for eleven official South African languages. PDF versions of the texts stem from South Africa’s Department of Basic Education’s online public access repository. Plain text is extracted from the PDFs and the texts are tokenized. The data set contains 429 full-text files with 929 manually extracted comprehension and summary writing texts. The data is useful for studies investigating, e.g., linguistic properties, text readability, text properties, and linguistic complexity in any of the eleven languages. Furthermore, both intra-language and inter-language comparisons or investigations can be made.
first_indexed 2024-03-12T16:10:25Z
format Article
id doaj.art-5bf060e707a24bfdbece838e216442c4
institution Directory Open Access Journal
issn 2059-481X
language English
last_indexed 2024-03-12T16:10:25Z
publishDate 2023-07-01
publisher Ubiquity Press
record_format Article
series Journal of Open Humanities Data
spelling doaj.art-5bf060e707a24bfdbece838e216442c42023-08-09T13:59:18ZengUbiquity PressJournal of Open Humanities Data2059-481X2023-07-0199910.5334/johd.108108A Data Set of Final Year High School Examination Texts of South African Home and First Additional Language SubjectsJohannes Sibeko0https://orcid.org/0000-0003-3586-7491Menno van Zaanen1https://orcid.org/0000-0003-1841-2444Linguistics and Applied Linguistics, Nelson Mandela University, GqeberhaSouth African Centre for Digital Language Resources, North-West University, PotchefstroomThis article describes a data set of reading comprehension and summary writing texts that were used in final-year high school examinations in South Africa between 2008 and 2020. It contains texts for eleven official South African languages. PDF versions of the texts stem from South Africa’s Department of Basic Education’s online public access repository. Plain text is extracted from the PDFs and the texts are tokenized. The data set contains 429 full-text files with 929 manually extracted comprehension and summary writing texts. The data is useful for studies investigating, e.g., linguistic properties, text readability, text properties, and linguistic complexity in any of the eleven languages. Furthermore, both intra-language and inter-language comparisons or investigations can be made.https://account.openhumanitiesdata.metajnl.com/index.php/up-j-johd/article/view/108linguistic corpusindigenous languagesexamination textsreading comprehensionsummary writingfinal year high school
spellingShingle Johannes Sibeko
Menno van Zaanen
A Data Set of Final Year High School Examination Texts of South African Home and First Additional Language Subjects
Journal of Open Humanities Data
linguistic corpus
indigenous languages
examination texts
reading comprehension
summary writing
final year high school
title A Data Set of Final Year High School Examination Texts of South African Home and First Additional Language Subjects
title_full A Data Set of Final Year High School Examination Texts of South African Home and First Additional Language Subjects
title_fullStr A Data Set of Final Year High School Examination Texts of South African Home and First Additional Language Subjects
title_full_unstemmed A Data Set of Final Year High School Examination Texts of South African Home and First Additional Language Subjects
title_short A Data Set of Final Year High School Examination Texts of South African Home and First Additional Language Subjects
title_sort data set of final year high school examination texts of south african home and first additional language subjects
topic linguistic corpus
indigenous languages
examination texts
reading comprehension
summary writing
final year high school
url https://account.openhumanitiesdata.metajnl.com/index.php/up-j-johd/article/view/108
work_keys_str_mv AT johannessibeko adatasetoffinalyearhighschoolexaminationtextsofsouthafricanhomeandfirstadditionallanguagesubjects
AT mennovanzaanen adatasetoffinalyearhighschoolexaminationtextsofsouthafricanhomeandfirstadditionallanguagesubjects
AT johannessibeko datasetoffinalyearhighschoolexaminationtextsofsouthafricanhomeandfirstadditionallanguagesubjects
AT mennovanzaanen datasetoffinalyearhighschoolexaminationtextsofsouthafricanhomeandfirstadditionallanguagesubjects