Sentiment Analysis of Emirati Dialect
Recently, extensive studies and research in the Arabic Natural Language Processing (ANLP) field have been conducted for text classification and sentiment analysis. Moreover, the number of studies that target Arabic dialects has also increased. In this research paper, we constructed the first manuall...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-05-01
|
Series: | Big Data and Cognitive Computing |
Subjects: | |
Online Access: | https://www.mdpi.com/2504-2289/6/2/57 |
_version_ | 1797489931279400960 |
---|---|
author | Arwa A. Al Shamsi Sherief Abdallah |
author_facet | Arwa A. Al Shamsi Sherief Abdallah |
author_sort | Arwa A. Al Shamsi |
collection | DOAJ |
description | Recently, extensive studies and research in the Arabic Natural Language Processing (ANLP) field have been conducted for text classification and sentiment analysis. Moreover, the number of studies that target Arabic dialects has also increased. In this research paper, we constructed the first manually annotated dataset of the Emirati dialect for the Instagram platform. The constructed dataset consisted of more than 70,000 comments, mostly written in the Emirati dialect. We annotated the comments in the dataset based on text polarity, dividing them into positive, negative, and neutral categories, and the number of annotated comments was 70,000. Moreover, the dataset was also annotated for the dialect type, categorized into the Emirati dialect, Arabic dialects, and MSA. Preprocessing and TF-IDF features extraction approaches were applied to the constructed Emirati dataset to prepare the dataset for the sentiment analysis experiment and improve its classification performance. The sentiment analysis experiment was carried out on both balanced and unbalanced datasets using several machine learning classifiers. The evaluation metrics of the sentiment analysis experiments were accuracy, recall, precision, and f-measure. The results reported that the best accuracy result was 80.80%, and it was achieved when the ensemble model was applied for the sentiment classification of the unbalanced dataset. |
first_indexed | 2024-03-10T00:24:41Z |
format | Article |
id | doaj.art-5569c6d494e549b3af4b7cdd963b6b37 |
institution | Directory Open Access Journal |
issn | 2504-2289 |
language | English |
last_indexed | 2024-03-10T00:24:41Z |
publishDate | 2022-05-01 |
publisher | MDPI AG |
record_format | Article |
series | Big Data and Cognitive Computing |
spelling | doaj.art-5569c6d494e549b3af4b7cdd963b6b372023-11-23T15:36:23ZengMDPI AGBig Data and Cognitive Computing2504-22892022-05-01625710.3390/bdcc6020057Sentiment Analysis of Emirati DialectArwa A. Al Shamsi0Sherief Abdallah1Faculty of Engineering and IT, The British University in Dubai, Dubai P.O. Box 345015, United Arab EmiratesFaculty of Engineering and IT, The British University in Dubai, Dubai P.O. Box 345015, United Arab EmiratesRecently, extensive studies and research in the Arabic Natural Language Processing (ANLP) field have been conducted for text classification and sentiment analysis. Moreover, the number of studies that target Arabic dialects has also increased. In this research paper, we constructed the first manually annotated dataset of the Emirati dialect for the Instagram platform. The constructed dataset consisted of more than 70,000 comments, mostly written in the Emirati dialect. We annotated the comments in the dataset based on text polarity, dividing them into positive, negative, and neutral categories, and the number of annotated comments was 70,000. Moreover, the dataset was also annotated for the dialect type, categorized into the Emirati dialect, Arabic dialects, and MSA. Preprocessing and TF-IDF features extraction approaches were applied to the constructed Emirati dataset to prepare the dataset for the sentiment analysis experiment and improve its classification performance. The sentiment analysis experiment was carried out on both balanced and unbalanced datasets using several machine learning classifiers. The evaluation metrics of the sentiment analysis experiments were accuracy, recall, precision, and f-measure. The results reported that the best accuracy result was 80.80%, and it was achieved when the ensemble model was applied for the sentiment classification of the unbalanced dataset.https://www.mdpi.com/2504-2289/6/2/57corpusEmirati datasetArabic dialectssentiment analysisclassificationclassifiers |
spellingShingle | Arwa A. Al Shamsi Sherief Abdallah Sentiment Analysis of Emirati Dialect Big Data and Cognitive Computing corpus Emirati dataset Arabic dialects sentiment analysis classification classifiers |
title | Sentiment Analysis of Emirati Dialect |
title_full | Sentiment Analysis of Emirati Dialect |
title_fullStr | Sentiment Analysis of Emirati Dialect |
title_full_unstemmed | Sentiment Analysis of Emirati Dialect |
title_short | Sentiment Analysis of Emirati Dialect |
title_sort | sentiment analysis of emirati dialect |
topic | corpus Emirati dataset Arabic dialects sentiment analysis classification classifiers |
url | https://www.mdpi.com/2504-2289/6/2/57 |
work_keys_str_mv | AT arwaaalshamsi sentimentanalysisofemiratidialect AT sheriefabdallah sentimentanalysisofemiratidialect |