A Machine Learning-Sentiment Analysis on Monkeypox Outbreak: An Extensive Dataset to Show the Polarity of Public Opinion From Twitter Tweets
Research on sentiment analysis has proven to be very useful in public health, particularly in analyzing infectious diseases. As the world recovers from the onslaught of the COVID-19 pandemic, concerns are rising that another pandemic, known as monkeypox, might hit the world again. Monkeypox is an in...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2023-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10036414/ |
_version_ | 1811167527820066816 |
---|---|
author | Staphord Bengesi Timothy Oladunni Ruth Olusegun Halima Audu |
author_facet | Staphord Bengesi Timothy Oladunni Ruth Olusegun Halima Audu |
author_sort | Staphord Bengesi |
collection | DOAJ |
description | Research on sentiment analysis has proven to be very useful in public health, particularly in analyzing infectious diseases. As the world recovers from the onslaught of the COVID-19 pandemic, concerns are rising that another pandemic, known as monkeypox, might hit the world again. Monkeypox is an infectious disease reported in over 73 countries across the globe. This sudden outbreak has become a major concern for many individuals and health authorities. Different social media channels have presented discussions, views, opinions, and emotions about the monkeypox outbreak. Social media sentiments often result in panic, misinformation, and stigmatization of some minority groups. Therefore, accurate information, guidelines, and health protocols related to this virus are critical. We aim to analyze public sentiments on the recent monkeypox outbreak, with the purpose of helping decision-makers gain a better understanding of the public perceptions of the disease. We hope that government and health authorities will find the work useful in crafting health policies and mitigating strategies to control the spread of the disease, and guide against its misrepresentations. Our study was conducted in two stages. In the first stage, we collected over 500,000 multilingual tweets related to the monkeypox post on Twitter and then performed sentiment analysis on them using VADER and TextBlob, to annotate the extracted tweets into positive, negative, and neutral sentiments. The second stage of our study involved the design, development, and evaluation of 56 classification models. Stemming and lemmatization techniques were used for vocabulary normalization. Vectorization was based on CountVectorizer and TF-IDF methodologies. K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Random Forest, Logistic Regression, Multilayer Perceptron (MLP), Naïve Bayes, and XGBoost were deployed as learning algorithms. Performance evaluation was based on accuracy, F1 Score, Precision, and Recall. Our experimental results showed that the model developed using TextBlob annotation + Lemmatization + CountVectorizer + SVM yielded the highest accuracy of about 0.9348. |
first_indexed | 2024-04-10T16:11:58Z |
format | Article |
id | doaj.art-cef134fde306495097b74a4521132374 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-04-10T16:11:58Z |
publishDate | 2023-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-cef134fde306495097b74a45211323742023-02-10T00:00:25ZengIEEEIEEE Access2169-35362023-01-0111118111182610.1109/ACCESS.2023.324229010036414A Machine Learning-Sentiment Analysis on Monkeypox Outbreak: An Extensive Dataset to Show the Polarity of Public Opinion From Twitter TweetsStaphord Bengesi0https://orcid.org/0000-0002-1925-4196Timothy Oladunni1Ruth Olusegun2Halima Audu3Department of Computer Science, Bowie State University, Bowie, MD, USADepartment of Computer Science, Morgan State University, Baltimore, MD, USADepartment of Computer Science, Bowie State University, Bowie, MD, USADepartment of Computer Science, Bowie State University, Bowie, MD, USAResearch on sentiment analysis has proven to be very useful in public health, particularly in analyzing infectious diseases. As the world recovers from the onslaught of the COVID-19 pandemic, concerns are rising that another pandemic, known as monkeypox, might hit the world again. Monkeypox is an infectious disease reported in over 73 countries across the globe. This sudden outbreak has become a major concern for many individuals and health authorities. Different social media channels have presented discussions, views, opinions, and emotions about the monkeypox outbreak. Social media sentiments often result in panic, misinformation, and stigmatization of some minority groups. Therefore, accurate information, guidelines, and health protocols related to this virus are critical. We aim to analyze public sentiments on the recent monkeypox outbreak, with the purpose of helping decision-makers gain a better understanding of the public perceptions of the disease. We hope that government and health authorities will find the work useful in crafting health policies and mitigating strategies to control the spread of the disease, and guide against its misrepresentations. Our study was conducted in two stages. In the first stage, we collected over 500,000 multilingual tweets related to the monkeypox post on Twitter and then performed sentiment analysis on them using VADER and TextBlob, to annotate the extracted tweets into positive, negative, and neutral sentiments. The second stage of our study involved the design, development, and evaluation of 56 classification models. Stemming and lemmatization techniques were used for vocabulary normalization. Vectorization was based on CountVectorizer and TF-IDF methodologies. K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Random Forest, Logistic Regression, Multilayer Perceptron (MLP), Naïve Bayes, and XGBoost were deployed as learning algorithms. Performance evaluation was based on accuracy, F1 Score, Precision, and Recall. Our experimental results showed that the model developed using TextBlob annotation + Lemmatization + CountVectorizer + SVM yielded the highest accuracy of about 0.9348.https://ieeexplore.ieee.org/document/10036414/Count vectorizermachine learning algorithmmonkeypoxsentiment analysistwitterTF-IDF |
spellingShingle | Staphord Bengesi Timothy Oladunni Ruth Olusegun Halima Audu A Machine Learning-Sentiment Analysis on Monkeypox Outbreak: An Extensive Dataset to Show the Polarity of Public Opinion From Twitter Tweets IEEE Access Count vectorizer machine learning algorithm monkeypox sentiment analysis TF-IDF |
title | A Machine Learning-Sentiment Analysis on Monkeypox Outbreak: An Extensive Dataset to Show the Polarity of Public Opinion From Twitter Tweets |
title_full | A Machine Learning-Sentiment Analysis on Monkeypox Outbreak: An Extensive Dataset to Show the Polarity of Public Opinion From Twitter Tweets |
title_fullStr | A Machine Learning-Sentiment Analysis on Monkeypox Outbreak: An Extensive Dataset to Show the Polarity of Public Opinion From Twitter Tweets |
title_full_unstemmed | A Machine Learning-Sentiment Analysis on Monkeypox Outbreak: An Extensive Dataset to Show the Polarity of Public Opinion From Twitter Tweets |
title_short | A Machine Learning-Sentiment Analysis on Monkeypox Outbreak: An Extensive Dataset to Show the Polarity of Public Opinion From Twitter Tweets |
title_sort | machine learning sentiment analysis on monkeypox outbreak an extensive dataset to show the polarity of public opinion from twitter tweets |
topic | Count vectorizer machine learning algorithm monkeypox sentiment analysis TF-IDF |
url | https://ieeexplore.ieee.org/document/10036414/ |
work_keys_str_mv | AT staphordbengesi amachinelearningsentimentanalysisonmonkeypoxoutbreakanextensivedatasettoshowthepolarityofpublicopinionfromtwittertweets AT timothyoladunni amachinelearningsentimentanalysisonmonkeypoxoutbreakanextensivedatasettoshowthepolarityofpublicopinionfromtwittertweets AT rutholusegun amachinelearningsentimentanalysisonmonkeypoxoutbreakanextensivedatasettoshowthepolarityofpublicopinionfromtwittertweets AT halimaaudu amachinelearningsentimentanalysisonmonkeypoxoutbreakanextensivedatasettoshowthepolarityofpublicopinionfromtwittertweets AT staphordbengesi machinelearningsentimentanalysisonmonkeypoxoutbreakanextensivedatasettoshowthepolarityofpublicopinionfromtwittertweets AT timothyoladunni machinelearningsentimentanalysisonmonkeypoxoutbreakanextensivedatasettoshowthepolarityofpublicopinionfromtwittertweets AT rutholusegun machinelearningsentimentanalysisonmonkeypoxoutbreakanextensivedatasettoshowthepolarityofpublicopinionfromtwittertweets AT halimaaudu machinelearningsentimentanalysisonmonkeypoxoutbreakanextensivedatasettoshowthepolarityofpublicopinionfromtwittertweets |