Dataset for comparable evaluation of machine translation between 11 South African languages

This data article describes the Autshumato machine translation evaluation set. The evaluation set contains data that can be used to evaluate machine translation systems between any of the 11 official South African languages. The dataset is parallel with four reference translations available for each...

Full description

Bibliographic Details
Main Authors: Cindy A. McKellar, Martin J. Puttkammer
Format: Article
Language:English
Published: Elsevier 2020-04-01
Series:Data in Brief
Online Access:http://www.sciencedirect.com/science/article/pii/S2352340920300408
_version_ 1818302041890488320
author Cindy A. McKellar
Martin J. Puttkammer
author_facet Cindy A. McKellar
Martin J. Puttkammer
author_sort Cindy A. McKellar
collection DOAJ
description This data article describes the Autshumato machine translation evaluation set. The evaluation set contains data that can be used to evaluate machine translation systems between any of the 11 official South African languages. The dataset is parallel with four reference translations available for each of the following languages: Afrikaans, English, isiNdebele, isiXhosa, isiZulu, Sepedi, Sesotho, Setswana, Siswati, Tshivenḓa and Xitsonga. Keywords: Machine translation, Automatic evaluation, Natural language processing, Human language technology
first_indexed 2024-12-13T05:32:36Z
format Article
id doaj.art-6b253a05543c4ea095d0777617096c05
institution Directory Open Access Journal
issn 2352-3409
language English
last_indexed 2024-12-13T05:32:36Z
publishDate 2020-04-01
publisher Elsevier
record_format Article
series Data in Brief
spelling doaj.art-6b253a05543c4ea095d0777617096c052022-12-21T23:58:01ZengElsevierData in Brief2352-34092020-04-0129Dataset for comparable evaluation of machine translation between 11 South African languagesCindy A. McKellar0Martin J. Puttkammer1Corresponding author.; Centre for Text Technology, North-West University, South AfricaCentre for Text Technology, North-West University, South AfricaThis data article describes the Autshumato machine translation evaluation set. The evaluation set contains data that can be used to evaluate machine translation systems between any of the 11 official South African languages. The dataset is parallel with four reference translations available for each of the following languages: Afrikaans, English, isiNdebele, isiXhosa, isiZulu, Sepedi, Sesotho, Setswana, Siswati, Tshivenḓa and Xitsonga. Keywords: Machine translation, Automatic evaluation, Natural language processing, Human language technologyhttp://www.sciencedirect.com/science/article/pii/S2352340920300408
spellingShingle Cindy A. McKellar
Martin J. Puttkammer
Dataset for comparable evaluation of machine translation between 11 South African languages
Data in Brief
title Dataset for comparable evaluation of machine translation between 11 South African languages
title_full Dataset for comparable evaluation of machine translation between 11 South African languages
title_fullStr Dataset for comparable evaluation of machine translation between 11 South African languages
title_full_unstemmed Dataset for comparable evaluation of machine translation between 11 South African languages
title_short Dataset for comparable evaluation of machine translation between 11 South African languages
title_sort dataset for comparable evaluation of machine translation between 11 south african languages
url http://www.sciencedirect.com/science/article/pii/S2352340920300408
work_keys_str_mv AT cindyamckellar datasetforcomparableevaluationofmachinetranslationbetween11southafricanlanguages
AT martinjputtkammer datasetforcomparableevaluationofmachinetranslationbetween11southafricanlanguages