A Machine Learning-Sentiment Analysis on Monkeypox Outbreak: An Extensive Dataset to Show the Polarity of Public Opinion From Twitter Tweets

Research on sentiment analysis has proven to be very useful in public health, particularly in analyzing infectious diseases. As the world recovers from the onslaught of the COVID-19 pandemic, concerns are rising that another pandemic, known as monkeypox, might hit the world again. Monkeypox is an in...

Full description

Bibliographic Details
Main Authors: Staphord Bengesi, Timothy Oladunni, Ruth Olusegun, Halima Audu
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10036414/
_version_ 1811167527820066816
author Staphord Bengesi
Timothy Oladunni
Ruth Olusegun
Halima Audu
author_facet Staphord Bengesi
Timothy Oladunni
Ruth Olusegun
Halima Audu
author_sort Staphord Bengesi
collection DOAJ
description Research on sentiment analysis has proven to be very useful in public health, particularly in analyzing infectious diseases. As the world recovers from the onslaught of the COVID-19 pandemic, concerns are rising that another pandemic, known as monkeypox, might hit the world again. Monkeypox is an infectious disease reported in over 73 countries across the globe. This sudden outbreak has become a major concern for many individuals and health authorities. Different social media channels have presented discussions, views, opinions, and emotions about the monkeypox outbreak. Social media sentiments often result in panic, misinformation, and stigmatization of some minority groups. Therefore, accurate information, guidelines, and health protocols related to this virus are critical. We aim to analyze public sentiments on the recent monkeypox outbreak, with the purpose of helping decision-makers gain a better understanding of the public perceptions of the disease. We hope that government and health authorities will find the work useful in crafting health policies and mitigating strategies to control the spread of the disease, and guide against its misrepresentations. Our study was conducted in two stages. In the first stage, we collected over 500,000 multilingual tweets related to the monkeypox post on Twitter and then performed sentiment analysis on them using VADER and TextBlob, to annotate the extracted tweets into positive, negative, and neutral sentiments. The second stage of our study involved the design, development, and evaluation of 56 classification models. Stemming and lemmatization techniques were used for vocabulary normalization. Vectorization was based on CountVectorizer and TF-IDF methodologies. K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Random Forest, Logistic Regression, Multilayer Perceptron (MLP), Naïve Bayes, and XGBoost were deployed as learning algorithms. Performance evaluation was based on accuracy, F1 Score, Precision, and Recall. Our experimental results showed that the model developed using TextBlob annotation + Lemmatization + CountVectorizer + SVM yielded the highest accuracy of about 0.9348.
first_indexed 2024-04-10T16:11:58Z
format Article
id doaj.art-cef134fde306495097b74a4521132374
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-10T16:11:58Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-cef134fde306495097b74a45211323742023-02-10T00:00:25ZengIEEEIEEE Access2169-35362023-01-0111118111182610.1109/ACCESS.2023.324229010036414A Machine Learning-Sentiment Analysis on Monkeypox Outbreak: An Extensive Dataset to Show the Polarity of Public Opinion From Twitter TweetsStaphord Bengesi0https://orcid.org/0000-0002-1925-4196Timothy Oladunni1Ruth Olusegun2Halima Audu3Department of Computer Science, Bowie State University, Bowie, MD, USADepartment of Computer Science, Morgan State University, Baltimore, MD, USADepartment of Computer Science, Bowie State University, Bowie, MD, USADepartment of Computer Science, Bowie State University, Bowie, MD, USAResearch on sentiment analysis has proven to be very useful in public health, particularly in analyzing infectious diseases. As the world recovers from the onslaught of the COVID-19 pandemic, concerns are rising that another pandemic, known as monkeypox, might hit the world again. Monkeypox is an infectious disease reported in over 73 countries across the globe. This sudden outbreak has become a major concern for many individuals and health authorities. Different social media channels have presented discussions, views, opinions, and emotions about the monkeypox outbreak. Social media sentiments often result in panic, misinformation, and stigmatization of some minority groups. Therefore, accurate information, guidelines, and health protocols related to this virus are critical. We aim to analyze public sentiments on the recent monkeypox outbreak, with the purpose of helping decision-makers gain a better understanding of the public perceptions of the disease. We hope that government and health authorities will find the work useful in crafting health policies and mitigating strategies to control the spread of the disease, and guide against its misrepresentations. Our study was conducted in two stages. In the first stage, we collected over 500,000 multilingual tweets related to the monkeypox post on Twitter and then performed sentiment analysis on them using VADER and TextBlob, to annotate the extracted tweets into positive, negative, and neutral sentiments. The second stage of our study involved the design, development, and evaluation of 56 classification models. Stemming and lemmatization techniques were used for vocabulary normalization. Vectorization was based on CountVectorizer and TF-IDF methodologies. K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Random Forest, Logistic Regression, Multilayer Perceptron (MLP), Naïve Bayes, and XGBoost were deployed as learning algorithms. Performance evaluation was based on accuracy, F1 Score, Precision, and Recall. Our experimental results showed that the model developed using TextBlob annotation + Lemmatization + CountVectorizer + SVM yielded the highest accuracy of about 0.9348.https://ieeexplore.ieee.org/document/10036414/Count vectorizermachine learning algorithmmonkeypoxsentiment analysistwitterTF-IDF
spellingShingle Staphord Bengesi
Timothy Oladunni
Ruth Olusegun
Halima Audu
A Machine Learning-Sentiment Analysis on Monkeypox Outbreak: An Extensive Dataset to Show the Polarity of Public Opinion From Twitter Tweets
IEEE Access
Count vectorizer
machine learning algorithm
monkeypox
sentiment analysis
twitter
TF-IDF
title A Machine Learning-Sentiment Analysis on Monkeypox Outbreak: An Extensive Dataset to Show the Polarity of Public Opinion From Twitter Tweets
title_full A Machine Learning-Sentiment Analysis on Monkeypox Outbreak: An Extensive Dataset to Show the Polarity of Public Opinion From Twitter Tweets
title_fullStr A Machine Learning-Sentiment Analysis on Monkeypox Outbreak: An Extensive Dataset to Show the Polarity of Public Opinion From Twitter Tweets
title_full_unstemmed A Machine Learning-Sentiment Analysis on Monkeypox Outbreak: An Extensive Dataset to Show the Polarity of Public Opinion From Twitter Tweets
title_short A Machine Learning-Sentiment Analysis on Monkeypox Outbreak: An Extensive Dataset to Show the Polarity of Public Opinion From Twitter Tweets
title_sort machine learning sentiment analysis on monkeypox outbreak an extensive dataset to show the polarity of public opinion from twitter tweets
topic Count vectorizer
machine learning algorithm
monkeypox
sentiment analysis
twitter
TF-IDF
url https://ieeexplore.ieee.org/document/10036414/
work_keys_str_mv AT staphordbengesi amachinelearningsentimentanalysisonmonkeypoxoutbreakanextensivedatasettoshowthepolarityofpublicopinionfromtwittertweets
AT timothyoladunni amachinelearningsentimentanalysisonmonkeypoxoutbreakanextensivedatasettoshowthepolarityofpublicopinionfromtwittertweets
AT rutholusegun amachinelearningsentimentanalysisonmonkeypoxoutbreakanextensivedatasettoshowthepolarityofpublicopinionfromtwittertweets
AT halimaaudu amachinelearningsentimentanalysisonmonkeypoxoutbreakanextensivedatasettoshowthepolarityofpublicopinionfromtwittertweets
AT staphordbengesi machinelearningsentimentanalysisonmonkeypoxoutbreakanextensivedatasettoshowthepolarityofpublicopinionfromtwittertweets
AT timothyoladunni machinelearningsentimentanalysisonmonkeypoxoutbreakanextensivedatasettoshowthepolarityofpublicopinionfromtwittertweets
AT rutholusegun machinelearningsentimentanalysisonmonkeypoxoutbreakanextensivedatasettoshowthepolarityofpublicopinionfromtwittertweets
AT halimaaudu machinelearningsentimentanalysisonmonkeypoxoutbreakanextensivedatasettoshowthepolarityofpublicopinionfromtwittertweets