You can’t suggest that?!
In this article, we study correction of spelling errors, specifically on how the spelling errors are made and how can we model them computationally in order to fix them. The article describes two different approaches to generating spelling correction suggestions for three Uralic languages: Estonian...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Septentrio Academic Publishing
2022-08-01
|
Series: | Nordlyd: Tromsø University Working Papers on Language & Linguistics |
Subjects: | |
Online Access: | https://septentrio.uit.no/index.php/nordlyd/article/view/6349 |
_version_ | 1798005970729697280 |
---|---|
author | Heiki-Jaan Kaalep Flammie Pirinen Sjur Moshagen |
author_facet | Heiki-Jaan Kaalep Flammie Pirinen Sjur Moshagen |
author_sort | Heiki-Jaan Kaalep |
collection | DOAJ |
description |
In this article, we study correction of spelling errors, specifically on how the spelling errors are made and how can we model them computationally in order to fix them.
The article describes two different approaches to generating spelling correction suggestions for three Uralic languages: Estonian, North Sámi and South Sámi.
The first approach of modelling spelling errors is rule-based, where experts write rules that describe the kind of errors are made, and these are compiled into finite-state automaton that models the errors.
The second is data-based, where we show a machine learning algorithm a corpus of errors that humans have made, and it creates a neural network that can model the errors.
Both approaches require collection of error corpora and understanding its contents; therefore we also describe the actual errors we have seen in detail.
We find that while both approaches create error correction systems, with current resources the expert-build systems are still more reliable.
|
first_indexed | 2024-04-11T12:48:34Z |
format | Article |
id | doaj.art-66fab39dfd704bbc974aa4e1b585c8fa |
institution | Directory Open Access Journal |
issn | 1503-8599 |
language | English |
last_indexed | 2024-04-11T12:48:34Z |
publishDate | 2022-08-01 |
publisher | Septentrio Academic Publishing |
record_format | Article |
series | Nordlyd: Tromsø University Working Papers on Language & Linguistics |
spelling | doaj.art-66fab39dfd704bbc974aa4e1b585c8fa2022-12-22T04:23:17ZengSeptentrio Academic PublishingNordlyd: Tromsø University Working Papers on Language & Linguistics1503-85992022-08-0146110.7557/12.6349You can’t suggest that?! Heiki-Jaan Kaalep0Flammie Pirinen1Sjur Moshagen2Tartu ülikoolUiT Norgga árktalaš universitehtaUiT Norgga árktalaš universitehta In this article, we study correction of spelling errors, specifically on how the spelling errors are made and how can we model them computationally in order to fix them. The article describes two different approaches to generating spelling correction suggestions for three Uralic languages: Estonian, North Sámi and South Sámi. The first approach of modelling spelling errors is rule-based, where experts write rules that describe the kind of errors are made, and these are compiled into finite-state automaton that models the errors. The second is data-based, where we show a machine learning algorithm a corpus of errors that humans have made, and it creates a neural network that can model the errors. Both approaches require collection of error corpora and understanding its contents; therefore we also describe the actual errors we have seen in detail. We find that while both approaches create error correction systems, with current resources the expert-build systems are still more reliable. https://septentrio.uit.no/index.php/nordlyd/article/view/6349Spell-Checkingrule-basedfsamachine learningsami languagesestonian |
spellingShingle | Heiki-Jaan Kaalep Flammie Pirinen Sjur Moshagen You can’t suggest that?! Nordlyd: Tromsø University Working Papers on Language & Linguistics Spell-Checking rule-based fsa machine learning sami languages estonian |
title | You can’t suggest that?! |
title_full | You can’t suggest that?! |
title_fullStr | You can’t suggest that?! |
title_full_unstemmed | You can’t suggest that?! |
title_short | You can’t suggest that?! |
title_sort | you can t suggest that |
topic | Spell-Checking rule-based fsa machine learning sami languages estonian |
url | https://septentrio.uit.no/index.php/nordlyd/article/view/6349 |
work_keys_str_mv | AT heikijaankaalep youcantsuggestthat AT flammiepirinen youcantsuggestthat AT sjurmoshagen youcantsuggestthat |