Sentiment Analysis of Emirati Dialect

Recently, extensive studies and research in the Arabic Natural Language Processing (ANLP) field have been conducted for text classification and sentiment analysis. Moreover, the number of studies that target Arabic dialects has also increased. In this research paper, we constructed the first manuall...

Full description

Bibliographic Details
Main Authors: Arwa A. Al Shamsi, Sherief Abdallah
Format: Article
Language:English
Published: MDPI AG 2022-05-01
Series:Big Data and Cognitive Computing
Subjects:
Online Access:https://www.mdpi.com/2504-2289/6/2/57
_version_ 1797489931279400960
author Arwa A. Al Shamsi
Sherief Abdallah
author_facet Arwa A. Al Shamsi
Sherief Abdallah
author_sort Arwa A. Al Shamsi
collection DOAJ
description Recently, extensive studies and research in the Arabic Natural Language Processing (ANLP) field have been conducted for text classification and sentiment analysis. Moreover, the number of studies that target Arabic dialects has also increased. In this research paper, we constructed the first manually annotated dataset of the Emirati dialect for the Instagram platform. The constructed dataset consisted of more than 70,000 comments, mostly written in the Emirati dialect. We annotated the comments in the dataset based on text polarity, dividing them into positive, negative, and neutral categories, and the number of annotated comments was 70,000. Moreover, the dataset was also annotated for the dialect type, categorized into the Emirati dialect, Arabic dialects, and MSA. Preprocessing and TF-IDF features extraction approaches were applied to the constructed Emirati dataset to prepare the dataset for the sentiment analysis experiment and improve its classification performance. The sentiment analysis experiment was carried out on both balanced and unbalanced datasets using several machine learning classifiers. The evaluation metrics of the sentiment analysis experiments were accuracy, recall, precision, and f-measure. The results reported that the best accuracy result was 80.80%, and it was achieved when the ensemble model was applied for the sentiment classification of the unbalanced dataset.
first_indexed 2024-03-10T00:24:41Z
format Article
id doaj.art-5569c6d494e549b3af4b7cdd963b6b37
institution Directory Open Access Journal
issn 2504-2289
language English
last_indexed 2024-03-10T00:24:41Z
publishDate 2022-05-01
publisher MDPI AG
record_format Article
series Big Data and Cognitive Computing
spelling doaj.art-5569c6d494e549b3af4b7cdd963b6b372023-11-23T15:36:23ZengMDPI AGBig Data and Cognitive Computing2504-22892022-05-01625710.3390/bdcc6020057Sentiment Analysis of Emirati DialectArwa A. Al Shamsi0Sherief Abdallah1Faculty of Engineering and IT, The British University in Dubai, Dubai P.O. Box 345015, United Arab EmiratesFaculty of Engineering and IT, The British University in Dubai, Dubai P.O. Box 345015, United Arab EmiratesRecently, extensive studies and research in the Arabic Natural Language Processing (ANLP) field have been conducted for text classification and sentiment analysis. Moreover, the number of studies that target Arabic dialects has also increased. In this research paper, we constructed the first manually annotated dataset of the Emirati dialect for the Instagram platform. The constructed dataset consisted of more than 70,000 comments, mostly written in the Emirati dialect. We annotated the comments in the dataset based on text polarity, dividing them into positive, negative, and neutral categories, and the number of annotated comments was 70,000. Moreover, the dataset was also annotated for the dialect type, categorized into the Emirati dialect, Arabic dialects, and MSA. Preprocessing and TF-IDF features extraction approaches were applied to the constructed Emirati dataset to prepare the dataset for the sentiment analysis experiment and improve its classification performance. The sentiment analysis experiment was carried out on both balanced and unbalanced datasets using several machine learning classifiers. The evaluation metrics of the sentiment analysis experiments were accuracy, recall, precision, and f-measure. The results reported that the best accuracy result was 80.80%, and it was achieved when the ensemble model was applied for the sentiment classification of the unbalanced dataset.https://www.mdpi.com/2504-2289/6/2/57corpusEmirati datasetArabic dialectssentiment analysisclassificationclassifiers
spellingShingle Arwa A. Al Shamsi
Sherief Abdallah
Sentiment Analysis of Emirati Dialect
Big Data and Cognitive Computing
corpus
Emirati dataset
Arabic dialects
sentiment analysis
classification
classifiers
title Sentiment Analysis of Emirati Dialect
title_full Sentiment Analysis of Emirati Dialect
title_fullStr Sentiment Analysis of Emirati Dialect
title_full_unstemmed Sentiment Analysis of Emirati Dialect
title_short Sentiment Analysis of Emirati Dialect
title_sort sentiment analysis of emirati dialect
topic corpus
Emirati dataset
Arabic dialects
sentiment analysis
classification
classifiers
url https://www.mdpi.com/2504-2289/6/2/57
work_keys_str_mv AT arwaaalshamsi sentimentanalysisofemiratidialect
AT sheriefabdallah sentimentanalysisofemiratidialect