A Transformer-Based Educational Virtual Assistant Using Diacriticized Latin Script

A virtual assistant or smart chatbot should be able to understand user questions and respond correctly and usefully, even if the questions are posed ungrammatically with misspellings and other errors. This paper describes the design and construction of a text-to-text virtual assistant in Vietnamese,...

Full description

Bibliographic Details
Main Authors: Khang Nhut Lam, Loc Huu Nguy, Van Lam Le, Jugal Kalita
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10225737/
_version_ 1827856443466842112
author Khang Nhut Lam
Loc Huu Nguy
Van Lam Le
Jugal Kalita
author_facet Khang Nhut Lam
Loc Huu Nguy
Van Lam Le
Jugal Kalita
author_sort Khang Nhut Lam
collection DOAJ
description A virtual assistant or smart chatbot should be able to understand user questions and respond correctly and usefully, even if the questions are posed ungrammatically with misspellings and other errors. This paper describes the design and construction of a text-to-text virtual assistant in Vietnamese, a language that uses the Latin script with a liberal use of diacritics, for supporting students at a large university with over forty thousand students. The flexible virtual assistant consists of two integrated chatbots, both using Transformers: a) a closed-domain chatbot, trained on over thirty-five thousand factual question-answer pairs to engage in university-related conversation, and b) a second open-domain chatbot, trained on a large movie dialog dataset to engage in general conversation. The integrated virtual assistant classifies a question as either factual or general, and engages the appropriate chatbot to respond in a flexible, appropriate and natural manner. Although Vietnamese uses diacritics copiously, even educated users have a propensity to forgo the use of diacritics, and as a result, to facilitate smooth text-based communication, our design includes extensive pre-processing that uses learned Transformers to restore missing diacritics and correct misspellings. Our Transformer models outperform existing approaches for diacritic restoration and are better than several other methods for spelling correction in Vietnamese. In addition, the closed-domain chatbot performs better than other generative chatbots that have been developed to assist students in a university environment, irrespective of language and location.
first_indexed 2024-03-12T12:24:49Z
format Article
id doaj.art-9171749cb1a84aa5a6c02605994dde70
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-03-12T12:24:49Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-9171749cb1a84aa5a6c02605994dde702023-08-29T23:00:41ZengIEEEIEEE Access2169-35362023-01-0111900949010410.1109/ACCESS.2023.330763510225737A Transformer-Based Educational Virtual Assistant Using Diacriticized Latin ScriptKhang Nhut Lam0https://orcid.org/0000-0003-1103-5578Loc Huu Nguy1Van Lam Le2Jugal Kalita3https://orcid.org/0000-0002-8765-7018Department of Information Technology, Can Tho University, Can Tho, VietnamDepartment of Information Technology, Can Tho University, Can Tho, VietnamDepartment of Information Technology, Can Tho University, Can Tho, VietnamDepartment of Computer Science, University of Colorado, Colorado Springs, CO, USAA virtual assistant or smart chatbot should be able to understand user questions and respond correctly and usefully, even if the questions are posed ungrammatically with misspellings and other errors. This paper describes the design and construction of a text-to-text virtual assistant in Vietnamese, a language that uses the Latin script with a liberal use of diacritics, for supporting students at a large university with over forty thousand students. The flexible virtual assistant consists of two integrated chatbots, both using Transformers: a) a closed-domain chatbot, trained on over thirty-five thousand factual question-answer pairs to engage in university-related conversation, and b) a second open-domain chatbot, trained on a large movie dialog dataset to engage in general conversation. The integrated virtual assistant classifies a question as either factual or general, and engages the appropriate chatbot to respond in a flexible, appropriate and natural manner. Although Vietnamese uses diacritics copiously, even educated users have a propensity to forgo the use of diacritics, and as a result, to facilitate smooth text-based communication, our design includes extensive pre-processing that uses learned Transformers to restore missing diacritics and correct misspellings. Our Transformer models outperform existing approaches for diacritic restoration and are better than several other methods for spelling correction in Vietnamese. In addition, the closed-domain chatbot performs better than other generative chatbots that have been developed to assist students in a university environment, irrespective of language and location.https://ieeexplore.ieee.org/document/10225737/Chatbotdiacritic restorationeducational chatbotmisspellingtransformervirtual assistant
spellingShingle Khang Nhut Lam
Loc Huu Nguy
Van Lam Le
Jugal Kalita
A Transformer-Based Educational Virtual Assistant Using Diacriticized Latin Script
IEEE Access
Chatbot
diacritic restoration
educational chatbot
misspelling
transformer
virtual assistant
title A Transformer-Based Educational Virtual Assistant Using Diacriticized Latin Script
title_full A Transformer-Based Educational Virtual Assistant Using Diacriticized Latin Script
title_fullStr A Transformer-Based Educational Virtual Assistant Using Diacriticized Latin Script
title_full_unstemmed A Transformer-Based Educational Virtual Assistant Using Diacriticized Latin Script
title_short A Transformer-Based Educational Virtual Assistant Using Diacriticized Latin Script
title_sort transformer based educational virtual assistant using diacriticized latin script
topic Chatbot
diacritic restoration
educational chatbot
misspelling
transformer
virtual assistant
url https://ieeexplore.ieee.org/document/10225737/
work_keys_str_mv AT khangnhutlam atransformerbasededucationalvirtualassistantusingdiacriticizedlatinscript
AT lochuunguy atransformerbasededucationalvirtualassistantusingdiacriticizedlatinscript
AT vanlamle atransformerbasededucationalvirtualassistantusingdiacriticizedlatinscript
AT jugalkalita atransformerbasededucationalvirtualassistantusingdiacriticizedlatinscript
AT khangnhutlam transformerbasededucationalvirtualassistantusingdiacriticizedlatinscript
AT lochuunguy transformerbasededucationalvirtualassistantusingdiacriticizedlatinscript
AT vanlamle transformerbasededucationalvirtualassistantusingdiacriticizedlatinscript
AT jugalkalita transformerbasededucationalvirtualassistantusingdiacriticizedlatinscript