Tracking mosquito-borne diseases via social media: a machine learning approach to topic modelling and sentiment analysis
Mosquito-borne diseases (MBDs) are a major threat worldwide, and public consultation on these diseases is critical to disease control decision-making. However, traditional public surveys are time-consuming and labor-intensive and do not allow for timely decision-making. Recent studies have explored...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English English |
Published: |
PeerJ, Inc.
2024
|
Subjects: | |
Online Access: | https://eprints.ums.edu.my/id/eprint/41429/1/ABSTRACT.pdf https://eprints.ums.edu.my/id/eprint/41429/2/FULL%20TEXT.pdf |
_version_ | 1817926522828226560 |
---|---|
author | Song-Quan Ong Hamdan Ahmad |
author_facet | Song-Quan Ong Hamdan Ahmad |
author_sort | Song-Quan Ong |
collection | UMS |
description | Mosquito-borne diseases (MBDs) are a major threat worldwide, and public consultation on these diseases is critical to disease control decision-making. However, traditional public surveys are time-consuming and labor-intensive and do not allow for timely decision-making. Recent studies have explored text analytic approaches to elicit public comments from social media for public health. Therefore, this study aims to demonstrate a text analytics pipeline to identify the MBD topics that were discussed on Twitter and significantly influenced public opinion. A total of 25,000 tweets were retrieved from Twitter, topics were modelled using LDA and sentiment polarities were calculated using the VADER model. After data cleaning, we obtained a total of 6,243 tweets, which we were able to process with the feature selection algorithms. Boruta was used as a feature selection algorithm to determine the importance of topics to public opinion. The result was validated using multinomial logistic regression (MLR) performance and expert judgement. Important issues such as breeding sites, mosquito control, impact/funding, time of year, other diseases with similar symptoms, mosquitohuman interaction and biomarkers for diagnosis were identified by both LDA and experts. The MLR result shows that the topics selected by LASSO perform significantly better than the other algorithms, and the experts further justify the topics in the discussion. |
first_indexed | 2024-12-09T00:52:29Z |
format | Article |
id | ums.eprints-41429 |
institution | Universiti Malaysia Sabah |
language | English English |
last_indexed | 2024-12-09T00:52:29Z |
publishDate | 2024 |
publisher | PeerJ, Inc. |
record_format | dspace |
spelling | ums.eprints-414292024-10-16T06:11:41Z https://eprints.ums.edu.my/id/eprint/41429/ Tracking mosquito-borne diseases via social media: a machine learning approach to topic modelling and sentiment analysis Song-Quan Ong Hamdan Ahmad RA1-418.5 Medicine and the state RC109-216 Infectious and parasitic diseases Mosquito-borne diseases (MBDs) are a major threat worldwide, and public consultation on these diseases is critical to disease control decision-making. However, traditional public surveys are time-consuming and labor-intensive and do not allow for timely decision-making. Recent studies have explored text analytic approaches to elicit public comments from social media for public health. Therefore, this study aims to demonstrate a text analytics pipeline to identify the MBD topics that were discussed on Twitter and significantly influenced public opinion. A total of 25,000 tweets were retrieved from Twitter, topics were modelled using LDA and sentiment polarities were calculated using the VADER model. After data cleaning, we obtained a total of 6,243 tweets, which we were able to process with the feature selection algorithms. Boruta was used as a feature selection algorithm to determine the importance of topics to public opinion. The result was validated using multinomial logistic regression (MLR) performance and expert judgement. Important issues such as breeding sites, mosquito control, impact/funding, time of year, other diseases with similar symptoms, mosquitohuman interaction and biomarkers for diagnosis were identified by both LDA and experts. The MLR result shows that the topics selected by LASSO perform significantly better than the other algorithms, and the experts further justify the topics in the discussion. PeerJ, Inc. 2024 Article NonPeerReviewed text en https://eprints.ums.edu.my/id/eprint/41429/1/ABSTRACT.pdf text en https://eprints.ums.edu.my/id/eprint/41429/2/FULL%20TEXT.pdf Song-Quan Ong and Hamdan Ahmad (2024) Tracking mosquito-borne diseases via social media: a machine learning approach to topic modelling and sentiment analysis. PeerJ, 12 (1). pp. 1-17. ISSN 2167-8359 http://dx.doi.org/10.7717/peerj.17045 |
spellingShingle | RA1-418.5 Medicine and the state RC109-216 Infectious and parasitic diseases Song-Quan Ong Hamdan Ahmad Tracking mosquito-borne diseases via social media: a machine learning approach to topic modelling and sentiment analysis |
title | Tracking mosquito-borne diseases via social media: a machine learning approach to topic modelling and sentiment analysis |
title_full | Tracking mosquito-borne diseases via social media: a machine learning approach to topic modelling and sentiment analysis |
title_fullStr | Tracking mosquito-borne diseases via social media: a machine learning approach to topic modelling and sentiment analysis |
title_full_unstemmed | Tracking mosquito-borne diseases via social media: a machine learning approach to topic modelling and sentiment analysis |
title_short | Tracking mosquito-borne diseases via social media: a machine learning approach to topic modelling and sentiment analysis |
title_sort | tracking mosquito borne diseases via social media a machine learning approach to topic modelling and sentiment analysis |
topic | RA1-418.5 Medicine and the state RC109-216 Infectious and parasitic diseases |
url | https://eprints.ums.edu.my/id/eprint/41429/1/ABSTRACT.pdf https://eprints.ums.edu.my/id/eprint/41429/2/FULL%20TEXT.pdf |
work_keys_str_mv | AT songquanong trackingmosquitobornediseasesviasocialmediaamachinelearningapproachtotopicmodellingandsentimentanalysis AT hamdanahmad trackingmosquitobornediseasesviasocialmediaamachinelearningapproachtotopicmodellingandsentimentanalysis |