Automatic Correction of Real-Word Errors in Spanish Clinical Texts

Real-word errors are characterized by being actual terms in the dictionary. By providing context, real-word errors are detected. Traditional methods to detect and correct such errors are mostly based on counting the frequency of short word sequences in a corpus. Then, the probability of a word being...

Full description

Bibliographic Details
Main Authors:	Daniel Bravo-Candel, Jésica López-Hernández, José Antonio García-Díaz, Fernando Molina-Molina, Francisco García-Sánchez
Format:	Article
Language:	English
Published:	MDPI AG 2021-04-01
Series:	Sensors
Subjects:	error correction real-word error seq2seq neural machine translation model clinical texts word embeddings natural language processing
Online Access:	https://www.mdpi.com/1424-8220/21/9/2893

_version_	1797536965450530816
author	Daniel Bravo-Candel Jésica López-Hernández José Antonio García-Díaz Fernando Molina-Molina Francisco García-Sánchez
author_facet	Daniel Bravo-Candel Jésica López-Hernández José Antonio García-Díaz Fernando Molina-Molina Francisco García-Sánchez
author_sort	Daniel Bravo-Candel
collection	DOAJ
description	Real-word errors are characterized by being actual terms in the dictionary. By providing context, real-word errors are detected. Traditional methods to detect and correct such errors are mostly based on counting the frequency of short word sequences in a corpus. Then, the probability of a word being a real-word error is computed. On the other hand, state-of-the-art approaches make use of deep learning models to learn context by extracting semantic features from text. In this work, a deep learning model were implemented for correcting real-word errors in clinical text. Specifically, a Seq2seq Neural Machine Translation Model mapped erroneous sentences to correct them. For that, different types of error were generated in correct sentences by using rules. Different Seq2seq models were trained and evaluated on two corpora: the Wikicorpus and a collection of three clinical datasets. The medicine corpus was much smaller than the Wikicorpus due to privacy issues when dealing with patient information. Moreover, GloVe and Word2Vec pretrained word embeddings were used to study their performance. Despite the medicine corpus being much smaller than the Wikicorpus, Seq2seq models trained on the medicine corpus performed better than those models trained on the Wikicorpus. Nevertheless, a larger amount of clinical text is required to improve the results.
first_indexed	2024-03-10T12:08:20Z
format	Article
id	doaj.art-235f66c09b68440894ba0d6c798ef67e
institution	Directory Open Access Journal
issn	1424-8220
language	English
last_indexed	2024-03-10T12:08:20Z
publishDate	2021-04-01
publisher	MDPI AG
record_format	Article
series	Sensors
spelling	doaj.art-235f66c09b68440894ba0d6c798ef67e2023-11-21T16:25:01ZengMDPI AGSensors1424-82202021-04-01219289310.3390/s21092893Automatic Correction of Real-Word Errors in Spanish Clinical TextsDaniel Bravo-Candel0Jésica López-Hernández1José Antonio García-Díaz2Fernando Molina-Molina3Francisco García-Sánchez4Department of Informatics and Systems, Faculty of Computer Science, Campus de Espinardo, University of Murcia, 30100 Murcia, SpainDepartment of Informatics and Systems, Faculty of Computer Science, Campus de Espinardo, University of Murcia, 30100 Murcia, SpainDepartment of Informatics and Systems, Faculty of Computer Science, Campus de Espinardo, University of Murcia, 30100 Murcia, SpainVÓCALI Sistemas Inteligentes S.L., 30100 Murcia, SpainDepartment of Informatics and Systems, Faculty of Computer Science, Campus de Espinardo, University of Murcia, 30100 Murcia, SpainReal-word errors are characterized by being actual terms in the dictionary. By providing context, real-word errors are detected. Traditional methods to detect and correct such errors are mostly based on counting the frequency of short word sequences in a corpus. Then, the probability of a word being a real-word error is computed. On the other hand, state-of-the-art approaches make use of deep learning models to learn context by extracting semantic features from text. In this work, a deep learning model were implemented for correcting real-word errors in clinical text. Specifically, a Seq2seq Neural Machine Translation Model mapped erroneous sentences to correct them. For that, different types of error were generated in correct sentences by using rules. Different Seq2seq models were trained and evaluated on two corpora: the Wikicorpus and a collection of three clinical datasets. The medicine corpus was much smaller than the Wikicorpus due to privacy issues when dealing with patient information. Moreover, GloVe and Word2Vec pretrained word embeddings were used to study their performance. Despite the medicine corpus being much smaller than the Wikicorpus, Seq2seq models trained on the medicine corpus performed better than those models trained on the Wikicorpus. Nevertheless, a larger amount of clinical text is required to improve the results.https://www.mdpi.com/1424-8220/21/9/2893error correctionreal-word errorseq2seq neural machine translation modelclinical textsword embeddingsnatural language processing
spellingShingle	Daniel Bravo-Candel Jésica López-Hernández José Antonio García-Díaz Fernando Molina-Molina Francisco García-Sánchez Automatic Correction of Real-Word Errors in Spanish Clinical Texts Sensors error correction real-word error seq2seq neural machine translation model clinical texts word embeddings natural language processing
title	Automatic Correction of Real-Word Errors in Spanish Clinical Texts
title_full	Automatic Correction of Real-Word Errors in Spanish Clinical Texts
title_fullStr	Automatic Correction of Real-Word Errors in Spanish Clinical Texts
title_full_unstemmed	Automatic Correction of Real-Word Errors in Spanish Clinical Texts
title_short	Automatic Correction of Real-Word Errors in Spanish Clinical Texts
title_sort	automatic correction of real word errors in spanish clinical texts
topic	error correction real-word error seq2seq neural machine translation model clinical texts word embeddings natural language processing
url	https://www.mdpi.com/1424-8220/21/9/2893
work_keys_str_mv	AT danielbravocandel automaticcorrectionofrealworderrorsinspanishclinicaltexts AT jesicalopezhernandez automaticcorrectionofrealworderrorsinspanishclinicaltexts AT joseantoniogarciadiaz automaticcorrectionofrealworderrorsinspanishclinicaltexts AT fernandomolinamolina automaticcorrectionofrealworderrorsinspanishclinicaltexts AT franciscogarciasanchez automaticcorrectionofrealworderrorsinspanishclinicaltexts

Automatic Correction of Real-Word Errors in Spanish Clinical Texts

Similar Items