You can’t suggest that?!

In this article, we study correction of spelling errors, specifically on how the spelling errors are made and how can we model them computationally in order to fix them. The article describes two different approaches to generating spelling correction suggestions for three Uralic languages: Estonian...

Full description

Bibliographic Details
Main Authors: Heiki-Jaan Kaalep, Flammie Pirinen, Sjur Moshagen
Format: Article
Language:English
Published: Septentrio Academic Publishing 2022-08-01
Series:Nordlyd: Tromsø University Working Papers on Language & Linguistics
Subjects:
Online Access:https://septentrio.uit.no/index.php/nordlyd/article/view/6349
_version_ 1798005970729697280
author Heiki-Jaan Kaalep
Flammie Pirinen
Sjur Moshagen
author_facet Heiki-Jaan Kaalep
Flammie Pirinen
Sjur Moshagen
author_sort Heiki-Jaan Kaalep
collection DOAJ
description In this article, we study correction of spelling errors, specifically on how the spelling errors are made and how can we model them computationally in order to fix them. The article describes two different approaches to generating spelling correction suggestions for three Uralic languages: Estonian, North Sámi and South Sámi. The first approach of modelling spelling errors is rule-based, where experts write rules that describe the kind of errors are made, and these are compiled into finite-state automaton that models the errors. The second is data-based, where we show a machine learning algorithm a corpus of errors that humans have made, and it creates a neural network that can model the errors. Both approaches require collection of error corpora and understanding its contents; therefore we also describe the actual errors we have seen in detail. We find that while both approaches create error correction systems, with current resources the expert-build systems are still more reliable.
first_indexed 2024-04-11T12:48:34Z
format Article
id doaj.art-66fab39dfd704bbc974aa4e1b585c8fa
institution Directory Open Access Journal
issn 1503-8599
language English
last_indexed 2024-04-11T12:48:34Z
publishDate 2022-08-01
publisher Septentrio Academic Publishing
record_format Article
series Nordlyd: Tromsø University Working Papers on Language & Linguistics
spelling doaj.art-66fab39dfd704bbc974aa4e1b585c8fa2022-12-22T04:23:17ZengSeptentrio Academic PublishingNordlyd: Tromsø University Working Papers on Language & Linguistics1503-85992022-08-0146110.7557/12.6349You can’t suggest that?! Heiki-Jaan Kaalep0Flammie Pirinen1Sjur Moshagen2Tartu ülikoolUiT Norgga árktalaš universitehtaUiT Norgga árktalaš universitehta In this article, we study correction of spelling errors, specifically on how the spelling errors are made and how can we model them computationally in order to fix them. The article describes two different approaches to generating spelling correction suggestions for three Uralic languages: Estonian, North Sámi and South Sámi. The first approach of modelling spelling errors is rule-based, where experts write rules that describe the kind of errors are made, and these are compiled into finite-state automaton that models the errors. The second is data-based, where we show a machine learning algorithm a corpus of errors that humans have made, and it creates a neural network that can model the errors. Both approaches require collection of error corpora and understanding its contents; therefore we also describe the actual errors we have seen in detail. We find that while both approaches create error correction systems, with current resources the expert-build systems are still more reliable. https://septentrio.uit.no/index.php/nordlyd/article/view/6349Spell-Checkingrule-basedfsamachine learningsami languagesestonian
spellingShingle Heiki-Jaan Kaalep
Flammie Pirinen
Sjur Moshagen
You can’t suggest that?!
Nordlyd: Tromsø University Working Papers on Language & Linguistics
Spell-Checking
rule-based
fsa
machine learning
sami languages
estonian
title You can’t suggest that?!
title_full You can’t suggest that?!
title_fullStr You can’t suggest that?!
title_full_unstemmed You can’t suggest that?!
title_short You can’t suggest that?!
title_sort you can t suggest that
topic Spell-Checking
rule-based
fsa
machine learning
sami languages
estonian
url https://septentrio.uit.no/index.php/nordlyd/article/view/6349
work_keys_str_mv AT heikijaankaalep youcantsuggestthat
AT flammiepirinen youcantsuggestthat
AT sjurmoshagen youcantsuggestthat