Unsupervised Spelling Correction for Slovak
This paper introduces a method to automatically propose and choose a correction for an incorrectly written word in a large text corpus written in Slovak. This task can be described as a process of finding the best matching sequence of correct words to a list of incorrectly spelled words, found in th...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
VSB-Technical University of Ostrava
2013-01-01
|
Series: | Advances in Electrical and Electronic Engineering |
Subjects: | |
Online Access: | http://advances.utc.sk/index.php/AEEE/article/view/898 |
_version_ | 1797827069402415104 |
---|---|
author | Daniel Hladek Jan Stas Jozef Juhar |
author_facet | Daniel Hladek Jan Stas Jozef Juhar |
author_sort | Daniel Hladek |
collection | DOAJ |
description | This paper introduces a method to automatically propose and choose a correction for an incorrectly written word in a large text corpus written in Slovak. This task can be described as a process of finding the best matching sequence of correct words to a list of incorrectly spelled words, found in the input. Knowledge base of the classification system - statistics about sequences of correctly typed words and possible corrections for incorrectly typed words can be mathematically described as a hidden Markov model. The best matching sequence of correct words is found using Viterbi algorithm. The system will be evaluated on a manually corrected testing set. |
first_indexed | 2024-04-09T12:42:20Z |
format | Article |
id | doaj.art-db6bc109eaeb4ab0a87f2bb6ecd7e7c5 |
institution | Directory Open Access Journal |
issn | 1336-1376 1804-3119 |
language | English |
last_indexed | 2024-04-09T12:42:20Z |
publishDate | 2013-01-01 |
publisher | VSB-Technical University of Ostrava |
record_format | Article |
series | Advances in Electrical and Electronic Engineering |
spelling | doaj.art-db6bc109eaeb4ab0a87f2bb6ecd7e7c52023-05-14T20:50:08ZengVSB-Technical University of OstravaAdvances in Electrical and Electronic Engineering1336-13761804-31192013-01-0111539239710.15598/aeee.v11i5.898617Unsupervised Spelling Correction for SlovakDaniel Hladek0Jan StasJozef JuharDepartment of Electronics and Multimedia Communications Faculty of Electrical Engineering Technical University of Kosice Park Komenskeho 13 042 00 Kosice Slovak RepublicThis paper introduces a method to automatically propose and choose a correction for an incorrectly written word in a large text corpus written in Slovak. This task can be described as a process of finding the best matching sequence of correct words to a list of incorrectly spelled words, found in the input. Knowledge base of the classification system - statistics about sequences of correctly typed words and possible corrections for incorrectly typed words can be mathematically described as a hidden Markov model. The best matching sequence of correct words is found using Viterbi algorithm. The system will be evaluated on a manually corrected testing set.http://advances.utc.sk/index.php/AEEE/article/view/898automatic spelling correctionhidden markov modelnatural language processing. |
spellingShingle | Daniel Hladek Jan Stas Jozef Juhar Unsupervised Spelling Correction for Slovak Advances in Electrical and Electronic Engineering automatic spelling correction hidden markov model natural language processing. |
title | Unsupervised Spelling Correction for Slovak |
title_full | Unsupervised Spelling Correction for Slovak |
title_fullStr | Unsupervised Spelling Correction for Slovak |
title_full_unstemmed | Unsupervised Spelling Correction for Slovak |
title_short | Unsupervised Spelling Correction for Slovak |
title_sort | unsupervised spelling correction for slovak |
topic | automatic spelling correction hidden markov model natural language processing. |
url | http://advances.utc.sk/index.php/AEEE/article/view/898 |
work_keys_str_mv | AT danielhladek unsupervisedspellingcorrectionforslovak AT janstas unsupervisedspellingcorrectionforslovak AT jozefjuhar unsupervisedspellingcorrectionforslovak |