Identifying the Risk Factors of Allergic Rhinitis Based on Zhihu Comment Data Using a Topic-Enhanced Word-Embedding Model: Mixed Method Study and Cluster Analysis
BackgroundAllergic rhinitis (AR) is a chronic disease, and several risk factors predispose individuals to the condition in their daily lives, including exposure to allergens and inhalation irritants. Analyzing the potential risk factors that can trigger AR can provide referen...
Main Authors: | , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
JMIR Publications
2024-02-01
|
Series: | Journal of Medical Internet Research |
Online Access: | https://www.jmir.org/2024/1/e48324 |
_version_ | 1797299987701301248 |
---|---|
author | Dongxiao Gu Qin Wang Yidong Chai Xuejie Yang Wang Zhao Min Li Oleg Zolotarev Zhengfei Xu Gongrang Zhang |
author_facet | Dongxiao Gu Qin Wang Yidong Chai Xuejie Yang Wang Zhao Min Li Oleg Zolotarev Zhengfei Xu Gongrang Zhang |
author_sort | Dongxiao Gu |
collection | DOAJ |
description |
BackgroundAllergic rhinitis (AR) is a chronic disease, and several risk factors predispose individuals to the condition in their daily lives, including exposure to allergens and inhalation irritants. Analyzing the potential risk factors that can trigger AR can provide reference material for individuals to use to reduce its occurrence in their daily lives. Nowadays, social media is a part of daily life, with an increasing number of people using at least 1 platform regularly. Social media enables users to share experiences among large groups of people who share the same interests and experience the same afflictions. Notably, these channels promote the ability to share health information.
ObjectiveThis study aims to construct an intelligent method (TopicS-ClusterREV) for identifying the risk factors of AR based on these social media comments. The main questions were as follows: How many comments contained AR risk factor information? How many categories can these risk factors be summarized into? How do these risk factors trigger AR?
MethodsThis study crawled all the data from May 2012 to May 2022 under the topic of allergic rhinitis on Zhihu, obtaining a total of 9628 posts and 33,747 comments. We improved the Skip-gram model to train topic-enhanced word vector representations (TopicS) and then vectorized annotated text items for training the risk factor classifier. Furthermore, cluster analysis enabled a closer look into the opinions expressed in the category, namely gaining insight into how risk factors trigger AR.
ResultsOur classifier identified more comments containing risk factors than the other classification models, with an accuracy rate of 96.1% and a recall rate of 96.3%. In general, we clustered texts containing risk factors into 28 categories, with season, region, and mites being the most common risk factors. We gained insight into the risk factors expressed in each category; for example, seasonal changes and increased temperature differences between day and night can disrupt the body’s immune system and lead to the development of allergies.
ConclusionsOur approach can handle the amount of data and extract risk factors effectively. Moreover, the summary of risk factors can serve as a reference for individuals to reduce AR in their daily lives. The experimental data also provide a potential pathway that triggers AR. This finding can guide the development of management plans and interventions for AR. |
first_indexed | 2024-03-07T22:59:39Z |
format | Article |
id | doaj.art-efaf7dbd90f44e1981de767e09d63b9a |
institution | Directory Open Access Journal |
issn | 1438-8871 |
language | English |
last_indexed | 2024-03-07T22:59:39Z |
publishDate | 2024-02-01 |
publisher | JMIR Publications |
record_format | Article |
series | Journal of Medical Internet Research |
spelling | doaj.art-efaf7dbd90f44e1981de767e09d63b9a2024-02-22T15:45:34ZengJMIR PublicationsJournal of Medical Internet Research1438-88712024-02-0126e4832410.2196/48324Identifying the Risk Factors of Allergic Rhinitis Based on Zhihu Comment Data Using a Topic-Enhanced Word-Embedding Model: Mixed Method Study and Cluster AnalysisDongxiao Guhttps://orcid.org/0000-0003-3557-009XQin Wanghttps://orcid.org/0009-0008-1154-8596Yidong Chaihttps://orcid.org/0000-0003-0260-7589Xuejie Yanghttps://orcid.org/0000-0002-8258-7030Wang Zhaohttps://orcid.org/0000-0001-7453-9226Min Lihttps://orcid.org/0009-0001-4724-1763Oleg Zolotarevhttps://orcid.org/0000-0001-6917-9668Zhengfei Xuhttps://orcid.org/0009-0009-4309-8371Gongrang Zhanghttps://orcid.org/0009-0000-5494-366X BackgroundAllergic rhinitis (AR) is a chronic disease, and several risk factors predispose individuals to the condition in their daily lives, including exposure to allergens and inhalation irritants. Analyzing the potential risk factors that can trigger AR can provide reference material for individuals to use to reduce its occurrence in their daily lives. Nowadays, social media is a part of daily life, with an increasing number of people using at least 1 platform regularly. Social media enables users to share experiences among large groups of people who share the same interests and experience the same afflictions. Notably, these channels promote the ability to share health information. ObjectiveThis study aims to construct an intelligent method (TopicS-ClusterREV) for identifying the risk factors of AR based on these social media comments. The main questions were as follows: How many comments contained AR risk factor information? How many categories can these risk factors be summarized into? How do these risk factors trigger AR? MethodsThis study crawled all the data from May 2012 to May 2022 under the topic of allergic rhinitis on Zhihu, obtaining a total of 9628 posts and 33,747 comments. We improved the Skip-gram model to train topic-enhanced word vector representations (TopicS) and then vectorized annotated text items for training the risk factor classifier. Furthermore, cluster analysis enabled a closer look into the opinions expressed in the category, namely gaining insight into how risk factors trigger AR. ResultsOur classifier identified more comments containing risk factors than the other classification models, with an accuracy rate of 96.1% and a recall rate of 96.3%. In general, we clustered texts containing risk factors into 28 categories, with season, region, and mites being the most common risk factors. We gained insight into the risk factors expressed in each category; for example, seasonal changes and increased temperature differences between day and night can disrupt the body’s immune system and lead to the development of allergies. ConclusionsOur approach can handle the amount of data and extract risk factors effectively. Moreover, the summary of risk factors can serve as a reference for individuals to reduce AR in their daily lives. The experimental data also provide a potential pathway that triggers AR. This finding can guide the development of management plans and interventions for AR.https://www.jmir.org/2024/1/e48324 |
spellingShingle | Dongxiao Gu Qin Wang Yidong Chai Xuejie Yang Wang Zhao Min Li Oleg Zolotarev Zhengfei Xu Gongrang Zhang Identifying the Risk Factors of Allergic Rhinitis Based on Zhihu Comment Data Using a Topic-Enhanced Word-Embedding Model: Mixed Method Study and Cluster Analysis Journal of Medical Internet Research |
title | Identifying the Risk Factors of Allergic Rhinitis Based on Zhihu Comment Data Using a Topic-Enhanced Word-Embedding Model: Mixed Method Study and Cluster Analysis |
title_full | Identifying the Risk Factors of Allergic Rhinitis Based on Zhihu Comment Data Using a Topic-Enhanced Word-Embedding Model: Mixed Method Study and Cluster Analysis |
title_fullStr | Identifying the Risk Factors of Allergic Rhinitis Based on Zhihu Comment Data Using a Topic-Enhanced Word-Embedding Model: Mixed Method Study and Cluster Analysis |
title_full_unstemmed | Identifying the Risk Factors of Allergic Rhinitis Based on Zhihu Comment Data Using a Topic-Enhanced Word-Embedding Model: Mixed Method Study and Cluster Analysis |
title_short | Identifying the Risk Factors of Allergic Rhinitis Based on Zhihu Comment Data Using a Topic-Enhanced Word-Embedding Model: Mixed Method Study and Cluster Analysis |
title_sort | identifying the risk factors of allergic rhinitis based on zhihu comment data using a topic enhanced word embedding model mixed method study and cluster analysis |
url | https://www.jmir.org/2024/1/e48324 |
work_keys_str_mv | AT dongxiaogu identifyingtheriskfactorsofallergicrhinitisbasedonzhihucommentdatausingatopicenhancedwordembeddingmodelmixedmethodstudyandclusteranalysis AT qinwang identifyingtheriskfactorsofallergicrhinitisbasedonzhihucommentdatausingatopicenhancedwordembeddingmodelmixedmethodstudyandclusteranalysis AT yidongchai identifyingtheriskfactorsofallergicrhinitisbasedonzhihucommentdatausingatopicenhancedwordembeddingmodelmixedmethodstudyandclusteranalysis AT xuejieyang identifyingtheriskfactorsofallergicrhinitisbasedonzhihucommentdatausingatopicenhancedwordembeddingmodelmixedmethodstudyandclusteranalysis AT wangzhao identifyingtheriskfactorsofallergicrhinitisbasedonzhihucommentdatausingatopicenhancedwordembeddingmodelmixedmethodstudyandclusteranalysis AT minli identifyingtheriskfactorsofallergicrhinitisbasedonzhihucommentdatausingatopicenhancedwordembeddingmodelmixedmethodstudyandclusteranalysis AT olegzolotarev identifyingtheriskfactorsofallergicrhinitisbasedonzhihucommentdatausingatopicenhancedwordembeddingmodelmixedmethodstudyandclusteranalysis AT zhengfeixu identifyingtheriskfactorsofallergicrhinitisbasedonzhihucommentdatausingatopicenhancedwordembeddingmodelmixedmethodstudyandclusteranalysis AT gongrangzhang identifyingtheriskfactorsofallergicrhinitisbasedonzhihucommentdatausingatopicenhancedwordembeddingmodelmixedmethodstudyandclusteranalysis |