Strategies for building wordnets for under-resourced languages: The case of African languages

The African Wordnet Project (AWN) aims at building wordnets for five African languages: Setswana, isiXhosa, isiZulu, Sesotho sa Leboa (also referred to as Sepedi or Northern Sotho) and Tshivenda. Currently, the so-called expand model, based on the structure of the English Princeton WordNet (PWN), is...

Full description

Bibliographic Details
Main Authors: Sonja E. Bosch, Marissa Griesel
Format: Article
Language:Afrikaans
Published: AOSIS 2017-03-01
Series:Literator
Subjects:
Online Access:https://literator.org.za/index.php/literator/article/view/1351
_version_ 1819113586745671680
author Sonja E. Bosch
Marissa Griesel
author_facet Sonja E. Bosch
Marissa Griesel
author_sort Sonja E. Bosch
collection DOAJ
description The African Wordnet Project (AWN) aims at building wordnets for five African languages: Setswana, isiXhosa, isiZulu, Sesotho sa Leboa (also referred to as Sepedi or Northern Sotho) and Tshivenda. Currently, the so-called expand model, based on the structure of the English Princeton WordNet (PWN), is used to continually develop the African Wordnets manually. This is a labour-intensive work that needs to be performed by linguistic experts, guided by several considerations such as the level of lexicalisation of a term in the African language. Up to now, linguists were responsible for identifying and translating appropriate synsets without much help from electronic resources because in the case of African languages even basic resources such as computer readable and electronic bilingual wordlists are usually not freely available. Methods to speed up the manual development of synsets and ease the workload of the human language experts were recently investigated. These centred around utilising the minimal amount of information available in bilingual dictionaries to identify synsets in the PWN that should be included in the AWN, transferring information from dictionaries to the wordnet and presenting the potential synsets to linguists for final approval and inclusion in the wordnets. In this article, we describe the methodology developed for building the African Wordnets, a potentially significant resource for natural language processing applications. Available resources that could be taken advantage of and resources that had to be developed are investigated, and initial results and future plans are explained.
first_indexed 2024-12-22T04:31:46Z
format Article
id doaj.art-afb1ae04a1cd43d9b65f6f05fd7f13da
institution Directory Open Access Journal
issn 0258-2279
2219-8237
language Afrikaans
last_indexed 2024-12-22T04:31:46Z
publishDate 2017-03-01
publisher AOSIS
record_format Article
series Literator
spelling doaj.art-afb1ae04a1cd43d9b65f6f05fd7f13da2022-12-21T18:39:00ZafrAOSISLiterator0258-22792219-82372017-03-01381e1e1210.4102/lit.v38i1.13511196Strategies for building wordnets for under-resourced languages: The case of African languagesSonja E. Bosch0Marissa Griesel1Department of African Languages, University of South Africa; African Wordnet ProjectDepartment of African Languages, University of South Africa; African Wordnet ProjectThe African Wordnet Project (AWN) aims at building wordnets for five African languages: Setswana, isiXhosa, isiZulu, Sesotho sa Leboa (also referred to as Sepedi or Northern Sotho) and Tshivenda. Currently, the so-called expand model, based on the structure of the English Princeton WordNet (PWN), is used to continually develop the African Wordnets manually. This is a labour-intensive work that needs to be performed by linguistic experts, guided by several considerations such as the level of lexicalisation of a term in the African language. Up to now, linguists were responsible for identifying and translating appropriate synsets without much help from electronic resources because in the case of African languages even basic resources such as computer readable and electronic bilingual wordlists are usually not freely available. Methods to speed up the manual development of synsets and ease the workload of the human language experts were recently investigated. These centred around utilising the minimal amount of information available in bilingual dictionaries to identify synsets in the PWN that should be included in the AWN, transferring information from dictionaries to the wordnet and presenting the potential synsets to linguists for final approval and inclusion in the wordnets. In this article, we describe the methodology developed for building the African Wordnets, a potentially significant resource for natural language processing applications. Available resources that could be taken advantage of and resources that had to be developed are investigated, and initial results and future plans are explained.https://literator.org.za/index.php/literator/article/view/1351African wordnetunder-resourced languagessemi-automatic extractionbilingual dictionaries
spellingShingle Sonja E. Bosch
Marissa Griesel
Strategies for building wordnets for under-resourced languages: The case of African languages
Literator
African wordnet
under-resourced languages
semi-automatic extraction
bilingual dictionaries
title Strategies for building wordnets for under-resourced languages: The case of African languages
title_full Strategies for building wordnets for under-resourced languages: The case of African languages
title_fullStr Strategies for building wordnets for under-resourced languages: The case of African languages
title_full_unstemmed Strategies for building wordnets for under-resourced languages: The case of African languages
title_short Strategies for building wordnets for under-resourced languages: The case of African languages
title_sort strategies for building wordnets for under resourced languages the case of african languages
topic African wordnet
under-resourced languages
semi-automatic extraction
bilingual dictionaries
url https://literator.org.za/index.php/literator/article/view/1351
work_keys_str_mv AT sonjaebosch strategiesforbuildingwordnetsforunderresourcedlanguagesthecaseofafricanlanguages
AT marissagriesel strategiesforbuildingwordnetsforunderresourcedlanguagesthecaseofafricanlanguages