Predicting Drug Review Polarity Using the Combination Model of Multi-Sense Word Embedding and Fuzzy Latent Dirichlet Allocation (FLDA)

The massive volume of textual data generated in recent years has led to the development of new computer-based technologies, especially in the field of healthcare area. Sentiment analysis opens a new door in healthcare to improve public health data analysis and efficiently predict diseases. Many word...

Full description

Bibliographic Details
Main Authors: Siyue Song, Anju P. Johnson
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10290872/
_version_ 1797643654459817984
author Siyue Song
Anju P. Johnson
author_facet Siyue Song
Anju P. Johnson
author_sort Siyue Song
collection DOAJ
description The massive volume of textual data generated in recent years has led to the development of new computer-based technologies, especially in the field of healthcare area. Sentiment analysis opens a new door in healthcare to improve public health data analysis and efficiently predict diseases. Many words in natural language have multiple meanings or senses. However, traditional algorithms mainly focus on a single meaning but cannot capture the multiple senses of the words, leading to potential inaccuracies in sentiment analysis. Additionally, dealing with vagueness in linguistic terms is a common challenge in natural language processing; particularly, applying simple frequency terms is insufficient to measure the development states of different topics. In this research, we applied two multi-sense word embedding models, Probabilistic Fasttext and Multi-sense Skip-gram, to the sentiment analysis of drug reviews. The proposed models can better represent words with multiple meanings, producing more accurate sentiment analysis results. Additionally, we compared multi-sense word embedding with single embedding models and evaluated the classification methods compared to other classical machine learning technologies. Finally, the Fuzzy system was applied to estimate the topics hidden in the drug review dataset using the Latent Dirichlet Allocation (LDA) model; the Fuzzy rule-based system was applied to explain the classification result of drug review polarity. In particular, both models can have good performances during the classification task. Probabilistic Fasttext achieved an accuracy of 82.1%, and multi-sense skip-gram achieved an accuracy of 79.8%. The work has addressed several critical challenges related to sentiment analysis of healthcare data and has proposed a comprehensive approach to tackle them. The reported results indicate promising performance and the potential future applications in other medical domains beyond drug reviews further highlight the significance of this research.
first_indexed 2024-03-11T14:18:02Z
format Article
id doaj.art-31b1bba87c5c4656afc1dc4fb1caf67d
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-03-11T14:18:02Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-31b1bba87c5c4656afc1dc4fb1caf67d2023-10-31T23:00:29ZengIEEEIEEE Access2169-35362023-01-011111853811854610.1109/ACCESS.2023.332675710290872Predicting Drug Review Polarity Using the Combination Model of Multi-Sense Word Embedding and Fuzzy Latent Dirichlet Allocation (FLDA)Siyue Song0https://orcid.org/0000-0003-2389-7503Anju P. Johnson1https://orcid.org/0000-0002-7017-1644Department of Computer Science, Centre for Industrial Analytics (CIndA), School of Computing and Engineering, University of Huddersfield, Queensgate Campus, Huddersfield, U.K.Department of Computer Science, Centre for Industrial Analytics (CIndA), School of Computing and Engineering, University of Huddersfield, Queensgate Campus, Huddersfield, U.K.The massive volume of textual data generated in recent years has led to the development of new computer-based technologies, especially in the field of healthcare area. Sentiment analysis opens a new door in healthcare to improve public health data analysis and efficiently predict diseases. Many words in natural language have multiple meanings or senses. However, traditional algorithms mainly focus on a single meaning but cannot capture the multiple senses of the words, leading to potential inaccuracies in sentiment analysis. Additionally, dealing with vagueness in linguistic terms is a common challenge in natural language processing; particularly, applying simple frequency terms is insufficient to measure the development states of different topics. In this research, we applied two multi-sense word embedding models, Probabilistic Fasttext and Multi-sense Skip-gram, to the sentiment analysis of drug reviews. The proposed models can better represent words with multiple meanings, producing more accurate sentiment analysis results. Additionally, we compared multi-sense word embedding with single embedding models and evaluated the classification methods compared to other classical machine learning technologies. Finally, the Fuzzy system was applied to estimate the topics hidden in the drug review dataset using the Latent Dirichlet Allocation (LDA) model; the Fuzzy rule-based system was applied to explain the classification result of drug review polarity. In particular, both models can have good performances during the classification task. Probabilistic Fasttext achieved an accuracy of 82.1%, and multi-sense skip-gram achieved an accuracy of 79.8%. The work has addressed several critical challenges related to sentiment analysis of healthcare data and has proposed a comprehensive approach to tackle them. The reported results indicate promising performance and the potential future applications in other medical domains beyond drug reviews further highlight the significance of this research.https://ieeexplore.ieee.org/document/10290872/Classificationdrug reviewfeature extractionfuzzy systemfuzzy set theoryhealthcare data
spellingShingle Siyue Song
Anju P. Johnson
Predicting Drug Review Polarity Using the Combination Model of Multi-Sense Word Embedding and Fuzzy Latent Dirichlet Allocation (FLDA)
IEEE Access
Classification
drug review
feature extraction
fuzzy system
fuzzy set theory
healthcare data
title Predicting Drug Review Polarity Using the Combination Model of Multi-Sense Word Embedding and Fuzzy Latent Dirichlet Allocation (FLDA)
title_full Predicting Drug Review Polarity Using the Combination Model of Multi-Sense Word Embedding and Fuzzy Latent Dirichlet Allocation (FLDA)
title_fullStr Predicting Drug Review Polarity Using the Combination Model of Multi-Sense Word Embedding and Fuzzy Latent Dirichlet Allocation (FLDA)
title_full_unstemmed Predicting Drug Review Polarity Using the Combination Model of Multi-Sense Word Embedding and Fuzzy Latent Dirichlet Allocation (FLDA)
title_short Predicting Drug Review Polarity Using the Combination Model of Multi-Sense Word Embedding and Fuzzy Latent Dirichlet Allocation (FLDA)
title_sort predicting drug review polarity using the combination model of multi sense word embedding and fuzzy latent dirichlet allocation flda
topic Classification
drug review
feature extraction
fuzzy system
fuzzy set theory
healthcare data
url https://ieeexplore.ieee.org/document/10290872/
work_keys_str_mv AT siyuesong predictingdrugreviewpolarityusingthecombinationmodelofmultisensewordembeddingandfuzzylatentdirichletallocationflda
AT anjupjohnson predictingdrugreviewpolarityusingthecombinationmodelofmultisensewordembeddingandfuzzylatentdirichletallocationflda