Predicting Drug Review Polarity Using the Combination Model of Multi-Sense Word Embedding and Fuzzy Latent Dirichlet Allocation (FLDA)
The massive volume of textual data generated in recent years has led to the development of new computer-based technologies, especially in the field of healthcare area. Sentiment analysis opens a new door in healthcare to improve public health data analysis and efficiently predict diseases. Many word...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2023-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10290872/ |
_version_ | 1797643654459817984 |
---|---|
author | Siyue Song Anju P. Johnson |
author_facet | Siyue Song Anju P. Johnson |
author_sort | Siyue Song |
collection | DOAJ |
description | The massive volume of textual data generated in recent years has led to the development of new computer-based technologies, especially in the field of healthcare area. Sentiment analysis opens a new door in healthcare to improve public health data analysis and efficiently predict diseases. Many words in natural language have multiple meanings or senses. However, traditional algorithms mainly focus on a single meaning but cannot capture the multiple senses of the words, leading to potential inaccuracies in sentiment analysis. Additionally, dealing with vagueness in linguistic terms is a common challenge in natural language processing; particularly, applying simple frequency terms is insufficient to measure the development states of different topics. In this research, we applied two multi-sense word embedding models, Probabilistic Fasttext and Multi-sense Skip-gram, to the sentiment analysis of drug reviews. The proposed models can better represent words with multiple meanings, producing more accurate sentiment analysis results. Additionally, we compared multi-sense word embedding with single embedding models and evaluated the classification methods compared to other classical machine learning technologies. Finally, the Fuzzy system was applied to estimate the topics hidden in the drug review dataset using the Latent Dirichlet Allocation (LDA) model; the Fuzzy rule-based system was applied to explain the classification result of drug review polarity. In particular, both models can have good performances during the classification task. Probabilistic Fasttext achieved an accuracy of 82.1%, and multi-sense skip-gram achieved an accuracy of 79.8%. The work has addressed several critical challenges related to sentiment analysis of healthcare data and has proposed a comprehensive approach to tackle them. The reported results indicate promising performance and the potential future applications in other medical domains beyond drug reviews further highlight the significance of this research. |
first_indexed | 2024-03-11T14:18:02Z |
format | Article |
id | doaj.art-31b1bba87c5c4656afc1dc4fb1caf67d |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-03-11T14:18:02Z |
publishDate | 2023-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-31b1bba87c5c4656afc1dc4fb1caf67d2023-10-31T23:00:29ZengIEEEIEEE Access2169-35362023-01-011111853811854610.1109/ACCESS.2023.332675710290872Predicting Drug Review Polarity Using the Combination Model of Multi-Sense Word Embedding and Fuzzy Latent Dirichlet Allocation (FLDA)Siyue Song0https://orcid.org/0000-0003-2389-7503Anju P. Johnson1https://orcid.org/0000-0002-7017-1644Department of Computer Science, Centre for Industrial Analytics (CIndA), School of Computing and Engineering, University of Huddersfield, Queensgate Campus, Huddersfield, U.K.Department of Computer Science, Centre for Industrial Analytics (CIndA), School of Computing and Engineering, University of Huddersfield, Queensgate Campus, Huddersfield, U.K.The massive volume of textual data generated in recent years has led to the development of new computer-based technologies, especially in the field of healthcare area. Sentiment analysis opens a new door in healthcare to improve public health data analysis and efficiently predict diseases. Many words in natural language have multiple meanings or senses. However, traditional algorithms mainly focus on a single meaning but cannot capture the multiple senses of the words, leading to potential inaccuracies in sentiment analysis. Additionally, dealing with vagueness in linguistic terms is a common challenge in natural language processing; particularly, applying simple frequency terms is insufficient to measure the development states of different topics. In this research, we applied two multi-sense word embedding models, Probabilistic Fasttext and Multi-sense Skip-gram, to the sentiment analysis of drug reviews. The proposed models can better represent words with multiple meanings, producing more accurate sentiment analysis results. Additionally, we compared multi-sense word embedding with single embedding models and evaluated the classification methods compared to other classical machine learning technologies. Finally, the Fuzzy system was applied to estimate the topics hidden in the drug review dataset using the Latent Dirichlet Allocation (LDA) model; the Fuzzy rule-based system was applied to explain the classification result of drug review polarity. In particular, both models can have good performances during the classification task. Probabilistic Fasttext achieved an accuracy of 82.1%, and multi-sense skip-gram achieved an accuracy of 79.8%. The work has addressed several critical challenges related to sentiment analysis of healthcare data and has proposed a comprehensive approach to tackle them. The reported results indicate promising performance and the potential future applications in other medical domains beyond drug reviews further highlight the significance of this research.https://ieeexplore.ieee.org/document/10290872/Classificationdrug reviewfeature extractionfuzzy systemfuzzy set theoryhealthcare data |
spellingShingle | Siyue Song Anju P. Johnson Predicting Drug Review Polarity Using the Combination Model of Multi-Sense Word Embedding and Fuzzy Latent Dirichlet Allocation (FLDA) IEEE Access Classification drug review feature extraction fuzzy system fuzzy set theory healthcare data |
title | Predicting Drug Review Polarity Using the Combination Model of Multi-Sense Word Embedding and Fuzzy Latent Dirichlet Allocation (FLDA) |
title_full | Predicting Drug Review Polarity Using the Combination Model of Multi-Sense Word Embedding and Fuzzy Latent Dirichlet Allocation (FLDA) |
title_fullStr | Predicting Drug Review Polarity Using the Combination Model of Multi-Sense Word Embedding and Fuzzy Latent Dirichlet Allocation (FLDA) |
title_full_unstemmed | Predicting Drug Review Polarity Using the Combination Model of Multi-Sense Word Embedding and Fuzzy Latent Dirichlet Allocation (FLDA) |
title_short | Predicting Drug Review Polarity Using the Combination Model of Multi-Sense Word Embedding and Fuzzy Latent Dirichlet Allocation (FLDA) |
title_sort | predicting drug review polarity using the combination model of multi sense word embedding and fuzzy latent dirichlet allocation flda |
topic | Classification drug review feature extraction fuzzy system fuzzy set theory healthcare data |
url | https://ieeexplore.ieee.org/document/10290872/ |
work_keys_str_mv | AT siyuesong predictingdrugreviewpolarityusingthecombinationmodelofmultisensewordembeddingandfuzzylatentdirichletallocationflda AT anjupjohnson predictingdrugreviewpolarityusingthecombinationmodelofmultisensewordembeddingandfuzzylatentdirichletallocationflda |