MULTI-DOMAIN MACHINE LEARNING APPROACH OF NAMED ENTITY RECOGNITION FOR ARABIC BOOKING CHATBOT ENGINES USING PRE-TRAINED BIDIRECTIONAL TRANSFORMERS

Chatbots have recently become essential in various fields, ranging from customer service and information acquisition to entertainment. The use of chatbots reduces operational costs and human errors while providing services at any time. This work presents a Named Entity Recognition (NER) model for th...

Full description

Bibliographic Details
Main Authors: Boshra Taha Sadder, Rahma Taha Sadder, Gheith Abandah, Iyad Jafar
Format: Article
Language:English
Published: Scientific Research Support Fund of Jordan (SRSF) and Princess Sumaya University for Technology (PSUT) 2024-03-01
Series:Jordanian Journal of Computers and Information Technology
Subjects:
Online Access:https://www.jjcit.org/?mno=169089
_version_ 1797289861616500736
author Boshra Taha Sadder
Rahma Taha Sadder
Gheith Abandah
Iyad Jafar
author_facet Boshra Taha Sadder
Rahma Taha Sadder
Gheith Abandah
Iyad Jafar
author_sort Boshra Taha Sadder
collection DOAJ
description Chatbots have recently become essential in various fields, ranging from customer service and information acquisition to entertainment. The use of chatbots reduces operational costs and human errors while providing services at any time. This work presents a Named Entity Recognition (NER) model for the Arabic booking chatbot, focusing on booking tickets and appointments across multiple domains. This research paves the way for the development of chatbots that can support multiple booking domains, contributing to the advancement of the Arabic language in this field. We adopt deep machine learning and transfer learning approaches to solve this task. Specifically, we utilized and fine-tuned the AraBERTv0.2 base model to develop the Named Entity Recognition for Booking Queries (NERB) model. Furthermore, we extended it to the Domain-Aware Named Entity Recognition for Booking Queries (DA-NERB) model by adding an additional input for domain type and an embedding layer. The input to our proposed model consists of text sequences of reservation requests, while the output includes sequences of tags representing entities within the input sequences. For training and testing, we synthesized the Arabic Booking Chatbot-Synthetic Dataset (ABC-S Dataset), comprising 76,117 reservation samples that span seven different domains and encompassing 26 categories of named entities. Additionally, we collected the Arabic Booking Chatbot-Collected Dataset (ABC-C Dataset) from volunteers to evaluate our model using various samples. It's worth noting that these datasets are written in informal Arabic, specifically the Levantine dialect. The proposed model achieves 100% and 96.9% accuracy scores on ABC-S (test set) and ABC-C, respectively. Both the datasets and the code for our model are publicly available to support research in the field of Arabic chatbots. [JJCIT 2024; 10(1.000): 1-16]
first_indexed 2024-03-07T19:12:08Z
format Article
id doaj.art-43c2058c7fb14e85b9ffa947660785ae
institution Directory Open Access Journal
issn 2413-9351
2415-1076
language English
last_indexed 2024-03-07T19:12:08Z
publishDate 2024-03-01
publisher Scientific Research Support Fund of Jordan (SRSF) and Princess Sumaya University for Technology (PSUT)
record_format Article
series Jordanian Journal of Computers and Information Technology
spelling doaj.art-43c2058c7fb14e85b9ffa947660785ae2024-02-29T23:32:38ZengScientific Research Support Fund of Jordan (SRSF) and Princess Sumaya University for Technology (PSUT)Jordanian Journal of Computers and Information Technology2413-93512415-10762024-03-0110111610.5455/jjcit.71-1694435791169089MULTI-DOMAIN MACHINE LEARNING APPROACH OF NAMED ENTITY RECOGNITION FOR ARABIC BOOKING CHATBOT ENGINES USING PRE-TRAINED BIDIRECTIONAL TRANSFORMERSBoshra Taha Sadder0Rahma Taha Sadder1Gheith Abandah2Iyad Jafar3The University of Jordan The University of Jordan The University of Jordan The University of JordanChatbots have recently become essential in various fields, ranging from customer service and information acquisition to entertainment. The use of chatbots reduces operational costs and human errors while providing services at any time. This work presents a Named Entity Recognition (NER) model for the Arabic booking chatbot, focusing on booking tickets and appointments across multiple domains. This research paves the way for the development of chatbots that can support multiple booking domains, contributing to the advancement of the Arabic language in this field. We adopt deep machine learning and transfer learning approaches to solve this task. Specifically, we utilized and fine-tuned the AraBERTv0.2 base model to develop the Named Entity Recognition for Booking Queries (NERB) model. Furthermore, we extended it to the Domain-Aware Named Entity Recognition for Booking Queries (DA-NERB) model by adding an additional input for domain type and an embedding layer. The input to our proposed model consists of text sequences of reservation requests, while the output includes sequences of tags representing entities within the input sequences. For training and testing, we synthesized the Arabic Booking Chatbot-Synthetic Dataset (ABC-S Dataset), comprising 76,117 reservation samples that span seven different domains and encompassing 26 categories of named entities. Additionally, we collected the Arabic Booking Chatbot-Collected Dataset (ABC-C Dataset) from volunteers to evaluate our model using various samples. It's worth noting that these datasets are written in informal Arabic, specifically the Levantine dialect. The proposed model achieves 100% and 96.9% accuracy scores on ABC-S (test set) and ABC-C, respectively. Both the datasets and the code for our model are publicly available to support research in the field of Arabic chatbots. [JJCIT 2024; 10(1.000): 1-16]https://www.jjcit.org/?mno=169089chatbotarabic booking chatbotnamed entity recognitionarabertarabic booking dataset
spellingShingle Boshra Taha Sadder
Rahma Taha Sadder
Gheith Abandah
Iyad Jafar
MULTI-DOMAIN MACHINE LEARNING APPROACH OF NAMED ENTITY RECOGNITION FOR ARABIC BOOKING CHATBOT ENGINES USING PRE-TRAINED BIDIRECTIONAL TRANSFORMERS
Jordanian Journal of Computers and Information Technology
chatbot
arabic booking chatbot
named entity recognition
arabert
arabic booking dataset
title MULTI-DOMAIN MACHINE LEARNING APPROACH OF NAMED ENTITY RECOGNITION FOR ARABIC BOOKING CHATBOT ENGINES USING PRE-TRAINED BIDIRECTIONAL TRANSFORMERS
title_full MULTI-DOMAIN MACHINE LEARNING APPROACH OF NAMED ENTITY RECOGNITION FOR ARABIC BOOKING CHATBOT ENGINES USING PRE-TRAINED BIDIRECTIONAL TRANSFORMERS
title_fullStr MULTI-DOMAIN MACHINE LEARNING APPROACH OF NAMED ENTITY RECOGNITION FOR ARABIC BOOKING CHATBOT ENGINES USING PRE-TRAINED BIDIRECTIONAL TRANSFORMERS
title_full_unstemmed MULTI-DOMAIN MACHINE LEARNING APPROACH OF NAMED ENTITY RECOGNITION FOR ARABIC BOOKING CHATBOT ENGINES USING PRE-TRAINED BIDIRECTIONAL TRANSFORMERS
title_short MULTI-DOMAIN MACHINE LEARNING APPROACH OF NAMED ENTITY RECOGNITION FOR ARABIC BOOKING CHATBOT ENGINES USING PRE-TRAINED BIDIRECTIONAL TRANSFORMERS
title_sort multi domain machine learning approach of named entity recognition for arabic booking chatbot engines using pre trained bidirectional transformers
topic chatbot
arabic booking chatbot
named entity recognition
arabert
arabic booking dataset
url https://www.jjcit.org/?mno=169089
work_keys_str_mv AT boshratahasadder multidomainmachinelearningapproachofnamedentityrecognitionforarabicbookingchatbotenginesusingpretrainedbidirectionaltransformers
AT rahmatahasadder multidomainmachinelearningapproachofnamedentityrecognitionforarabicbookingchatbotenginesusingpretrainedbidirectionaltransformers
AT gheithabandah multidomainmachinelearningapproachofnamedentityrecognitionforarabicbookingchatbotenginesusingpretrainedbidirectionaltransformers
AT iyadjafar multidomainmachinelearningapproachofnamedentityrecognitionforarabicbookingchatbotenginesusingpretrainedbidirectionaltransformers