A machine learning-based approach for sentiment analysis on distance learning from Arabic Tweets

Social media platforms such as Twitter, YouTube, Instagram and Facebook are leading sources of large datasets nowadays. Twitter’s data is one of the most reliable due to its privacy policy. Tweets have been used for sentiment analysis and to identify meaningful information within the dataset. Our st...

Full description

Bibliographic Details
Main Author: Jameel Almalki
Format: Article
Language:English
Published: PeerJ Inc. 2022-07-01
Series:PeerJ Computer Science
Subjects:
Online Access:https://peerj.com/articles/cs-1047.pdf
_version_ 1828778039156146176
author Jameel Almalki
author_facet Jameel Almalki
author_sort Jameel Almalki
collection DOAJ
description Social media platforms such as Twitter, YouTube, Instagram and Facebook are leading sources of large datasets nowadays. Twitter’s data is one of the most reliable due to its privacy policy. Tweets have been used for sentiment analysis and to identify meaningful information within the dataset. Our study focused on the distance learning domain in Saudi Arabia by analyzing Arabic tweets about distance learning. This work proposes a model for analyzing people’s feedback using a Twitter dataset in the distance learning domain. The proposed model is based on the Apache Spark product to manage the large dataset. The proposed model uses the Twitter API to get the tweets as raw data. These tweets were stored in the Apache Spark server. A regex-based technique for preprocessing removed retweets, links, hashtags, English words and numbers, usernames, and emojis from the dataset. After that, a Logistic-based Regression model was trained on the pre-processed data. This Logistic Regression model, from the field of machine learning, was used to predict the sentiment inside the tweets. Finally, a Flask application was built for sentiment analysis of the Arabic tweets. The proposed model gives better results when compared to various applied techniques. The proposed model is evaluated on test data to calculate Accuracy, F1 Score, Precision, and Recall, obtaining scores of 91%, 90%, 90%, and 89%, respectively.
first_indexed 2024-12-11T16:34:07Z
format Article
id doaj.art-63c6f7cfead54c689e7f3021edf2c557
institution Directory Open Access Journal
issn 2376-5992
language English
last_indexed 2024-12-11T16:34:07Z
publishDate 2022-07-01
publisher PeerJ Inc.
record_format Article
series PeerJ Computer Science
spelling doaj.art-63c6f7cfead54c689e7f3021edf2c5572022-12-22T00:58:31ZengPeerJ Inc.PeerJ Computer Science2376-59922022-07-018e104710.7717/peerj-cs.1047A machine learning-based approach for sentiment analysis on distance learning from Arabic TweetsJameel Almalki0Department of Computer Science, College of Computer in Al-Leith, Umm Al-Qura University, Makkah, Saudi ArabiaSocial media platforms such as Twitter, YouTube, Instagram and Facebook are leading sources of large datasets nowadays. Twitter’s data is one of the most reliable due to its privacy policy. Tweets have been used for sentiment analysis and to identify meaningful information within the dataset. Our study focused on the distance learning domain in Saudi Arabia by analyzing Arabic tweets about distance learning. This work proposes a model for analyzing people’s feedback using a Twitter dataset in the distance learning domain. The proposed model is based on the Apache Spark product to manage the large dataset. The proposed model uses the Twitter API to get the tweets as raw data. These tweets were stored in the Apache Spark server. A regex-based technique for preprocessing removed retweets, links, hashtags, English words and numbers, usernames, and emojis from the dataset. After that, a Logistic-based Regression model was trained on the pre-processed data. This Logistic Regression model, from the field of machine learning, was used to predict the sentiment inside the tweets. Finally, a Flask application was built for sentiment analysis of the Arabic tweets. The proposed model gives better results when compared to various applied techniques. The proposed model is evaluated on test data to calculate Accuracy, F1 Score, Precision, and Recall, obtaining scores of 91%, 90%, 90%, and 89%, respectively.https://peerj.com/articles/cs-1047.pdfSentiment analysisSocial mediaE-LearningTwitterApache SparkArabic language
spellingShingle Jameel Almalki
A machine learning-based approach for sentiment analysis on distance learning from Arabic Tweets
PeerJ Computer Science
Sentiment analysis
Social media
E-Learning
Twitter
Apache Spark
Arabic language
title A machine learning-based approach for sentiment analysis on distance learning from Arabic Tweets
title_full A machine learning-based approach for sentiment analysis on distance learning from Arabic Tweets
title_fullStr A machine learning-based approach for sentiment analysis on distance learning from Arabic Tweets
title_full_unstemmed A machine learning-based approach for sentiment analysis on distance learning from Arabic Tweets
title_short A machine learning-based approach for sentiment analysis on distance learning from Arabic Tweets
title_sort machine learning based approach for sentiment analysis on distance learning from arabic tweets
topic Sentiment analysis
Social media
E-Learning
Twitter
Apache Spark
Arabic language
url https://peerj.com/articles/cs-1047.pdf
work_keys_str_mv AT jameelalmalki amachinelearningbasedapproachforsentimentanalysisondistancelearningfromarabictweets
AT jameelalmalki machinelearningbasedapproachforsentimentanalysisondistancelearningfromarabictweets