Sentiment Analysis of Emirati Dialect

Recently, extensive studies and research in the Arabic Natural Language Processing (ANLP) field have been conducted for text classification and sentiment analysis. Moreover, the number of studies that target Arabic dialects has also increased. In this research paper, we constructed the first manuall...

Full description

Bibliographic Details
Main Authors:	Arwa A. Al Shamsi, Sherief Abdallah
Format:	Article
Language:	English
Published:	MDPI AG 2022-05-01
Series:	Big Data and Cognitive Computing
Subjects:	corpus Emirati dataset Arabic dialects sentiment analysis classification classifiers
Online Access:	https://www.mdpi.com/2504-2289/6/2/57

_version_	1797489931279400960
author	Arwa A. Al Shamsi Sherief Abdallah
author_facet	Arwa A. Al Shamsi Sherief Abdallah
author_sort	Arwa A. Al Shamsi
collection	DOAJ
description	Recently, extensive studies and research in the Arabic Natural Language Processing (ANLP) field have been conducted for text classification and sentiment analysis. Moreover, the number of studies that target Arabic dialects has also increased. In this research paper, we constructed the first manually annotated dataset of the Emirati dialect for the Instagram platform. The constructed dataset consisted of more than 70,000 comments, mostly written in the Emirati dialect. We annotated the comments in the dataset based on text polarity, dividing them into positive, negative, and neutral categories, and the number of annotated comments was 70,000. Moreover, the dataset was also annotated for the dialect type, categorized into the Emirati dialect, Arabic dialects, and MSA. Preprocessing and TF-IDF features extraction approaches were applied to the constructed Emirati dataset to prepare the dataset for the sentiment analysis experiment and improve its classification performance. The sentiment analysis experiment was carried out on both balanced and unbalanced datasets using several machine learning classifiers. The evaluation metrics of the sentiment analysis experiments were accuracy, recall, precision, and f-measure. The results reported that the best accuracy result was 80.80%, and it was achieved when the ensemble model was applied for the sentiment classification of the unbalanced dataset.
first_indexed	2024-03-10T00:24:41Z
format	Article
id	doaj.art-5569c6d494e549b3af4b7cdd963b6b37
institution	Directory Open Access Journal
issn	2504-2289
language	English
last_indexed	2024-03-10T00:24:41Z
publishDate	2022-05-01
publisher	MDPI AG
record_format	Article
series	Big Data and Cognitive Computing
spelling	doaj.art-5569c6d494e549b3af4b7cdd963b6b372023-11-23T15:36:23ZengMDPI AGBig Data and Cognitive Computing2504-22892022-05-01625710.3390/bdcc6020057Sentiment Analysis of Emirati DialectArwa A. Al Shamsi0Sherief Abdallah1Faculty of Engineering and IT, The British University in Dubai, Dubai P.O. Box 345015, United Arab EmiratesFaculty of Engineering and IT, The British University in Dubai, Dubai P.O. Box 345015, United Arab EmiratesRecently, extensive studies and research in the Arabic Natural Language Processing (ANLP) field have been conducted for text classification and sentiment analysis. Moreover, the number of studies that target Arabic dialects has also increased. In this research paper, we constructed the first manually annotated dataset of the Emirati dialect for the Instagram platform. The constructed dataset consisted of more than 70,000 comments, mostly written in the Emirati dialect. We annotated the comments in the dataset based on text polarity, dividing them into positive, negative, and neutral categories, and the number of annotated comments was 70,000. Moreover, the dataset was also annotated for the dialect type, categorized into the Emirati dialect, Arabic dialects, and MSA. Preprocessing and TF-IDF features extraction approaches were applied to the constructed Emirati dataset to prepare the dataset for the sentiment analysis experiment and improve its classification performance. The sentiment analysis experiment was carried out on both balanced and unbalanced datasets using several machine learning classifiers. The evaluation metrics of the sentiment analysis experiments were accuracy, recall, precision, and f-measure. The results reported that the best accuracy result was 80.80%, and it was achieved when the ensemble model was applied for the sentiment classification of the unbalanced dataset.https://www.mdpi.com/2504-2289/6/2/57corpusEmirati datasetArabic dialectssentiment analysisclassificationclassifiers
spellingShingle	Arwa A. Al Shamsi Sherief Abdallah Sentiment Analysis of Emirati Dialect Big Data and Cognitive Computing corpus Emirati dataset Arabic dialects sentiment analysis classification classifiers
title	Sentiment Analysis of Emirati Dialect
title_full	Sentiment Analysis of Emirati Dialect
title_fullStr	Sentiment Analysis of Emirati Dialect
title_full_unstemmed	Sentiment Analysis of Emirati Dialect
title_short	Sentiment Analysis of Emirati Dialect
title_sort	sentiment analysis of emirati dialect
topic	corpus Emirati dataset Arabic dialects sentiment analysis classification classifiers
url	https://www.mdpi.com/2504-2289/6/2/57
work_keys_str_mv	AT arwaaalshamsi sentimentanalysisofemiratidialect AT sheriefabdallah sentimentanalysisofemiratidialect

Sentiment Analysis of Emirati Dialect

Similar Items