Tracking mosquito-borne diseases via social media: a machine learning approach to topic modelling and sentiment analysis

Mosquito-borne diseases (MBDs) are a major threat worldwide, and public consultation on these diseases is critical to disease control decision-making. However, traditional public surveys are time-consuming and labor-intensive and do not allow for timely decision-making. Recent studies have explored...

Full description

Bibliographic Details
Main Authors: Song-Quan Ong, Hamdan Ahmad
Format: Article
Language:English
English
Published: PeerJ, Inc. 2024
Subjects:
Online Access:https://eprints.ums.edu.my/id/eprint/41429/1/ABSTRACT.pdf
https://eprints.ums.edu.my/id/eprint/41429/2/FULL%20TEXT.pdf
_version_ 1817926522828226560
author Song-Quan Ong
Hamdan Ahmad
author_facet Song-Quan Ong
Hamdan Ahmad
author_sort Song-Quan Ong
collection UMS
description Mosquito-borne diseases (MBDs) are a major threat worldwide, and public consultation on these diseases is critical to disease control decision-making. However, traditional public surveys are time-consuming and labor-intensive and do not allow for timely decision-making. Recent studies have explored text analytic approaches to elicit public comments from social media for public health. Therefore, this study aims to demonstrate a text analytics pipeline to identify the MBD topics that were discussed on Twitter and significantly influenced public opinion. A total of 25,000 tweets were retrieved from Twitter, topics were modelled using LDA and sentiment polarities were calculated using the VADER model. After data cleaning, we obtained a total of 6,243 tweets, which we were able to process with the feature selection algorithms. Boruta was used as a feature selection algorithm to determine the importance of topics to public opinion. The result was validated using multinomial logistic regression (MLR) performance and expert judgement. Important issues such as breeding sites, mosquito control, impact/funding, time of year, other diseases with similar symptoms, mosquitohuman interaction and biomarkers for diagnosis were identified by both LDA and experts. The MLR result shows that the topics selected by LASSO perform significantly better than the other algorithms, and the experts further justify the topics in the discussion.
first_indexed 2024-12-09T00:52:29Z
format Article
id ums.eprints-41429
institution Universiti Malaysia Sabah
language English
English
last_indexed 2024-12-09T00:52:29Z
publishDate 2024
publisher PeerJ, Inc.
record_format dspace
spelling ums.eprints-414292024-10-16T06:11:41Z https://eprints.ums.edu.my/id/eprint/41429/ Tracking mosquito-borne diseases via social media: a machine learning approach to topic modelling and sentiment analysis Song-Quan Ong Hamdan Ahmad RA1-418.5 Medicine and the state RC109-216 Infectious and parasitic diseases Mosquito-borne diseases (MBDs) are a major threat worldwide, and public consultation on these diseases is critical to disease control decision-making. However, traditional public surveys are time-consuming and labor-intensive and do not allow for timely decision-making. Recent studies have explored text analytic approaches to elicit public comments from social media for public health. Therefore, this study aims to demonstrate a text analytics pipeline to identify the MBD topics that were discussed on Twitter and significantly influenced public opinion. A total of 25,000 tweets were retrieved from Twitter, topics were modelled using LDA and sentiment polarities were calculated using the VADER model. After data cleaning, we obtained a total of 6,243 tweets, which we were able to process with the feature selection algorithms. Boruta was used as a feature selection algorithm to determine the importance of topics to public opinion. The result was validated using multinomial logistic regression (MLR) performance and expert judgement. Important issues such as breeding sites, mosquito control, impact/funding, time of year, other diseases with similar symptoms, mosquitohuman interaction and biomarkers for diagnosis were identified by both LDA and experts. The MLR result shows that the topics selected by LASSO perform significantly better than the other algorithms, and the experts further justify the topics in the discussion. PeerJ, Inc. 2024 Article NonPeerReviewed text en https://eprints.ums.edu.my/id/eprint/41429/1/ABSTRACT.pdf text en https://eprints.ums.edu.my/id/eprint/41429/2/FULL%20TEXT.pdf Song-Quan Ong and Hamdan Ahmad (2024) Tracking mosquito-borne diseases via social media: a machine learning approach to topic modelling and sentiment analysis. PeerJ, 12 (1). pp. 1-17. ISSN 2167-8359 http://dx.doi.org/10.7717/peerj.17045
spellingShingle RA1-418.5 Medicine and the state
RC109-216 Infectious and parasitic diseases
Song-Quan Ong
Hamdan Ahmad
Tracking mosquito-borne diseases via social media: a machine learning approach to topic modelling and sentiment analysis
title Tracking mosquito-borne diseases via social media: a machine learning approach to topic modelling and sentiment analysis
title_full Tracking mosquito-borne diseases via social media: a machine learning approach to topic modelling and sentiment analysis
title_fullStr Tracking mosquito-borne diseases via social media: a machine learning approach to topic modelling and sentiment analysis
title_full_unstemmed Tracking mosquito-borne diseases via social media: a machine learning approach to topic modelling and sentiment analysis
title_short Tracking mosquito-borne diseases via social media: a machine learning approach to topic modelling and sentiment analysis
title_sort tracking mosquito borne diseases via social media a machine learning approach to topic modelling and sentiment analysis
topic RA1-418.5 Medicine and the state
RC109-216 Infectious and parasitic diseases
url https://eprints.ums.edu.my/id/eprint/41429/1/ABSTRACT.pdf
https://eprints.ums.edu.my/id/eprint/41429/2/FULL%20TEXT.pdf
work_keys_str_mv AT songquanong trackingmosquitobornediseasesviasocialmediaamachinelearningapproachtotopicmodellingandsentimentanalysis
AT hamdanahmad trackingmosquitobornediseasesviasocialmediaamachinelearningapproachtotopicmodellingandsentimentanalysis