A Transformer-Based Educational Virtual Assistant Using Diacriticized Latin Script
A virtual assistant or smart chatbot should be able to understand user questions and respond correctly and usefully, even if the questions are posed ungrammatically with misspellings and other errors. This paper describes the design and construction of a text-to-text virtual assistant in Vietnamese,...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2023-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10225737/ |
_version_ | 1827856443466842112 |
---|---|
author | Khang Nhut Lam Loc Huu Nguy Van Lam Le Jugal Kalita |
author_facet | Khang Nhut Lam Loc Huu Nguy Van Lam Le Jugal Kalita |
author_sort | Khang Nhut Lam |
collection | DOAJ |
description | A virtual assistant or smart chatbot should be able to understand user questions and respond correctly and usefully, even if the questions are posed ungrammatically with misspellings and other errors. This paper describes the design and construction of a text-to-text virtual assistant in Vietnamese, a language that uses the Latin script with a liberal use of diacritics, for supporting students at a large university with over forty thousand students. The flexible virtual assistant consists of two integrated chatbots, both using Transformers: a) a closed-domain chatbot, trained on over thirty-five thousand factual question-answer pairs to engage in university-related conversation, and b) a second open-domain chatbot, trained on a large movie dialog dataset to engage in general conversation. The integrated virtual assistant classifies a question as either factual or general, and engages the appropriate chatbot to respond in a flexible, appropriate and natural manner. Although Vietnamese uses diacritics copiously, even educated users have a propensity to forgo the use of diacritics, and as a result, to facilitate smooth text-based communication, our design includes extensive pre-processing that uses learned Transformers to restore missing diacritics and correct misspellings. Our Transformer models outperform existing approaches for diacritic restoration and are better than several other methods for spelling correction in Vietnamese. In addition, the closed-domain chatbot performs better than other generative chatbots that have been developed to assist students in a university environment, irrespective of language and location. |
first_indexed | 2024-03-12T12:24:49Z |
format | Article |
id | doaj.art-9171749cb1a84aa5a6c02605994dde70 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-03-12T12:24:49Z |
publishDate | 2023-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-9171749cb1a84aa5a6c02605994dde702023-08-29T23:00:41ZengIEEEIEEE Access2169-35362023-01-0111900949010410.1109/ACCESS.2023.330763510225737A Transformer-Based Educational Virtual Assistant Using Diacriticized Latin ScriptKhang Nhut Lam0https://orcid.org/0000-0003-1103-5578Loc Huu Nguy1Van Lam Le2Jugal Kalita3https://orcid.org/0000-0002-8765-7018Department of Information Technology, Can Tho University, Can Tho, VietnamDepartment of Information Technology, Can Tho University, Can Tho, VietnamDepartment of Information Technology, Can Tho University, Can Tho, VietnamDepartment of Computer Science, University of Colorado, Colorado Springs, CO, USAA virtual assistant or smart chatbot should be able to understand user questions and respond correctly and usefully, even if the questions are posed ungrammatically with misspellings and other errors. This paper describes the design and construction of a text-to-text virtual assistant in Vietnamese, a language that uses the Latin script with a liberal use of diacritics, for supporting students at a large university with over forty thousand students. The flexible virtual assistant consists of two integrated chatbots, both using Transformers: a) a closed-domain chatbot, trained on over thirty-five thousand factual question-answer pairs to engage in university-related conversation, and b) a second open-domain chatbot, trained on a large movie dialog dataset to engage in general conversation. The integrated virtual assistant classifies a question as either factual or general, and engages the appropriate chatbot to respond in a flexible, appropriate and natural manner. Although Vietnamese uses diacritics copiously, even educated users have a propensity to forgo the use of diacritics, and as a result, to facilitate smooth text-based communication, our design includes extensive pre-processing that uses learned Transformers to restore missing diacritics and correct misspellings. Our Transformer models outperform existing approaches for diacritic restoration and are better than several other methods for spelling correction in Vietnamese. In addition, the closed-domain chatbot performs better than other generative chatbots that have been developed to assist students in a university environment, irrespective of language and location.https://ieeexplore.ieee.org/document/10225737/Chatbotdiacritic restorationeducational chatbotmisspellingtransformervirtual assistant |
spellingShingle | Khang Nhut Lam Loc Huu Nguy Van Lam Le Jugal Kalita A Transformer-Based Educational Virtual Assistant Using Diacriticized Latin Script IEEE Access Chatbot diacritic restoration educational chatbot misspelling transformer virtual assistant |
title | A Transformer-Based Educational Virtual Assistant Using Diacriticized Latin Script |
title_full | A Transformer-Based Educational Virtual Assistant Using Diacriticized Latin Script |
title_fullStr | A Transformer-Based Educational Virtual Assistant Using Diacriticized Latin Script |
title_full_unstemmed | A Transformer-Based Educational Virtual Assistant Using Diacriticized Latin Script |
title_short | A Transformer-Based Educational Virtual Assistant Using Diacriticized Latin Script |
title_sort | transformer based educational virtual assistant using diacriticized latin script |
topic | Chatbot diacritic restoration educational chatbot misspelling transformer virtual assistant |
url | https://ieeexplore.ieee.org/document/10225737/ |
work_keys_str_mv | AT khangnhutlam atransformerbasededucationalvirtualassistantusingdiacriticizedlatinscript AT lochuunguy atransformerbasededucationalvirtualassistantusingdiacriticizedlatinscript AT vanlamle atransformerbasededucationalvirtualassistantusingdiacriticizedlatinscript AT jugalkalita atransformerbasededucationalvirtualassistantusingdiacriticizedlatinscript AT khangnhutlam transformerbasededucationalvirtualassistantusingdiacriticizedlatinscript AT lochuunguy transformerbasededucationalvirtualassistantusingdiacriticizedlatinscript AT vanlamle transformerbasededucationalvirtualassistantusingdiacriticizedlatinscript AT jugalkalita transformerbasededucationalvirtualassistantusingdiacriticizedlatinscript |