Identifying the Risk Factors of Allergic Rhinitis Based on Zhihu Comment Data Using a Topic-Enhanced Word-Embedding Model: Mixed Method Study and Cluster Analysis

BackgroundAllergic rhinitis (AR) is a chronic disease, and several risk factors predispose individuals to the condition in their daily lives, including exposure to allergens and inhalation irritants. Analyzing the potential risk factors that can trigger AR can provide referen...

Full description

Bibliographic Details
Main Authors: Dongxiao Gu, Qin Wang, Yidong Chai, Xuejie Yang, Wang Zhao, Min Li, Oleg Zolotarev, Zhengfei Xu, Gongrang Zhang
Format: Article
Language:English
Published: JMIR Publications 2024-02-01
Series:Journal of Medical Internet Research
Online Access:https://www.jmir.org/2024/1/e48324
_version_ 1797299987701301248
author Dongxiao Gu
Qin Wang
Yidong Chai
Xuejie Yang
Wang Zhao
Min Li
Oleg Zolotarev
Zhengfei Xu
Gongrang Zhang
author_facet Dongxiao Gu
Qin Wang
Yidong Chai
Xuejie Yang
Wang Zhao
Min Li
Oleg Zolotarev
Zhengfei Xu
Gongrang Zhang
author_sort Dongxiao Gu
collection DOAJ
description BackgroundAllergic rhinitis (AR) is a chronic disease, and several risk factors predispose individuals to the condition in their daily lives, including exposure to allergens and inhalation irritants. Analyzing the potential risk factors that can trigger AR can provide reference material for individuals to use to reduce its occurrence in their daily lives. Nowadays, social media is a part of daily life, with an increasing number of people using at least 1 platform regularly. Social media enables users to share experiences among large groups of people who share the same interests and experience the same afflictions. Notably, these channels promote the ability to share health information. ObjectiveThis study aims to construct an intelligent method (TopicS-ClusterREV) for identifying the risk factors of AR based on these social media comments. The main questions were as follows: How many comments contained AR risk factor information? How many categories can these risk factors be summarized into? How do these risk factors trigger AR? MethodsThis study crawled all the data from May 2012 to May 2022 under the topic of allergic rhinitis on Zhihu, obtaining a total of 9628 posts and 33,747 comments. We improved the Skip-gram model to train topic-enhanced word vector representations (TopicS) and then vectorized annotated text items for training the risk factor classifier. Furthermore, cluster analysis enabled a closer look into the opinions expressed in the category, namely gaining insight into how risk factors trigger AR. ResultsOur classifier identified more comments containing risk factors than the other classification models, with an accuracy rate of 96.1% and a recall rate of 96.3%. In general, we clustered texts containing risk factors into 28 categories, with season, region, and mites being the most common risk factors. We gained insight into the risk factors expressed in each category; for example, seasonal changes and increased temperature differences between day and night can disrupt the body’s immune system and lead to the development of allergies. ConclusionsOur approach can handle the amount of data and extract risk factors effectively. Moreover, the summary of risk factors can serve as a reference for individuals to reduce AR in their daily lives. The experimental data also provide a potential pathway that triggers AR. This finding can guide the development of management plans and interventions for AR.
first_indexed 2024-03-07T22:59:39Z
format Article
id doaj.art-efaf7dbd90f44e1981de767e09d63b9a
institution Directory Open Access Journal
issn 1438-8871
language English
last_indexed 2024-03-07T22:59:39Z
publishDate 2024-02-01
publisher JMIR Publications
record_format Article
series Journal of Medical Internet Research
spelling doaj.art-efaf7dbd90f44e1981de767e09d63b9a2024-02-22T15:45:34ZengJMIR PublicationsJournal of Medical Internet Research1438-88712024-02-0126e4832410.2196/48324Identifying the Risk Factors of Allergic Rhinitis Based on Zhihu Comment Data Using a Topic-Enhanced Word-Embedding Model: Mixed Method Study and Cluster AnalysisDongxiao Guhttps://orcid.org/0000-0003-3557-009XQin Wanghttps://orcid.org/0009-0008-1154-8596Yidong Chaihttps://orcid.org/0000-0003-0260-7589Xuejie Yanghttps://orcid.org/0000-0002-8258-7030Wang Zhaohttps://orcid.org/0000-0001-7453-9226Min Lihttps://orcid.org/0009-0001-4724-1763Oleg Zolotarevhttps://orcid.org/0000-0001-6917-9668Zhengfei Xuhttps://orcid.org/0009-0009-4309-8371Gongrang Zhanghttps://orcid.org/0009-0000-5494-366X BackgroundAllergic rhinitis (AR) is a chronic disease, and several risk factors predispose individuals to the condition in their daily lives, including exposure to allergens and inhalation irritants. Analyzing the potential risk factors that can trigger AR can provide reference material for individuals to use to reduce its occurrence in their daily lives. Nowadays, social media is a part of daily life, with an increasing number of people using at least 1 platform regularly. Social media enables users to share experiences among large groups of people who share the same interests and experience the same afflictions. Notably, these channels promote the ability to share health information. ObjectiveThis study aims to construct an intelligent method (TopicS-ClusterREV) for identifying the risk factors of AR based on these social media comments. The main questions were as follows: How many comments contained AR risk factor information? How many categories can these risk factors be summarized into? How do these risk factors trigger AR? MethodsThis study crawled all the data from May 2012 to May 2022 under the topic of allergic rhinitis on Zhihu, obtaining a total of 9628 posts and 33,747 comments. We improved the Skip-gram model to train topic-enhanced word vector representations (TopicS) and then vectorized annotated text items for training the risk factor classifier. Furthermore, cluster analysis enabled a closer look into the opinions expressed in the category, namely gaining insight into how risk factors trigger AR. ResultsOur classifier identified more comments containing risk factors than the other classification models, with an accuracy rate of 96.1% and a recall rate of 96.3%. In general, we clustered texts containing risk factors into 28 categories, with season, region, and mites being the most common risk factors. We gained insight into the risk factors expressed in each category; for example, seasonal changes and increased temperature differences between day and night can disrupt the body’s immune system and lead to the development of allergies. ConclusionsOur approach can handle the amount of data and extract risk factors effectively. Moreover, the summary of risk factors can serve as a reference for individuals to reduce AR in their daily lives. The experimental data also provide a potential pathway that triggers AR. This finding can guide the development of management plans and interventions for AR.https://www.jmir.org/2024/1/e48324
spellingShingle Dongxiao Gu
Qin Wang
Yidong Chai
Xuejie Yang
Wang Zhao
Min Li
Oleg Zolotarev
Zhengfei Xu
Gongrang Zhang
Identifying the Risk Factors of Allergic Rhinitis Based on Zhihu Comment Data Using a Topic-Enhanced Word-Embedding Model: Mixed Method Study and Cluster Analysis
Journal of Medical Internet Research
title Identifying the Risk Factors of Allergic Rhinitis Based on Zhihu Comment Data Using a Topic-Enhanced Word-Embedding Model: Mixed Method Study and Cluster Analysis
title_full Identifying the Risk Factors of Allergic Rhinitis Based on Zhihu Comment Data Using a Topic-Enhanced Word-Embedding Model: Mixed Method Study and Cluster Analysis
title_fullStr Identifying the Risk Factors of Allergic Rhinitis Based on Zhihu Comment Data Using a Topic-Enhanced Word-Embedding Model: Mixed Method Study and Cluster Analysis
title_full_unstemmed Identifying the Risk Factors of Allergic Rhinitis Based on Zhihu Comment Data Using a Topic-Enhanced Word-Embedding Model: Mixed Method Study and Cluster Analysis
title_short Identifying the Risk Factors of Allergic Rhinitis Based on Zhihu Comment Data Using a Topic-Enhanced Word-Embedding Model: Mixed Method Study and Cluster Analysis
title_sort identifying the risk factors of allergic rhinitis based on zhihu comment data using a topic enhanced word embedding model mixed method study and cluster analysis
url https://www.jmir.org/2024/1/e48324
work_keys_str_mv AT dongxiaogu identifyingtheriskfactorsofallergicrhinitisbasedonzhihucommentdatausingatopicenhancedwordembeddingmodelmixedmethodstudyandclusteranalysis
AT qinwang identifyingtheriskfactorsofallergicrhinitisbasedonzhihucommentdatausingatopicenhancedwordembeddingmodelmixedmethodstudyandclusteranalysis
AT yidongchai identifyingtheriskfactorsofallergicrhinitisbasedonzhihucommentdatausingatopicenhancedwordembeddingmodelmixedmethodstudyandclusteranalysis
AT xuejieyang identifyingtheriskfactorsofallergicrhinitisbasedonzhihucommentdatausingatopicenhancedwordembeddingmodelmixedmethodstudyandclusteranalysis
AT wangzhao identifyingtheriskfactorsofallergicrhinitisbasedonzhihucommentdatausingatopicenhancedwordembeddingmodelmixedmethodstudyandclusteranalysis
AT minli identifyingtheriskfactorsofallergicrhinitisbasedonzhihucommentdatausingatopicenhancedwordembeddingmodelmixedmethodstudyandclusteranalysis
AT olegzolotarev identifyingtheriskfactorsofallergicrhinitisbasedonzhihucommentdatausingatopicenhancedwordembeddingmodelmixedmethodstudyandclusteranalysis
AT zhengfeixu identifyingtheriskfactorsofallergicrhinitisbasedonzhihucommentdatausingatopicenhancedwordembeddingmodelmixedmethodstudyandclusteranalysis
AT gongrangzhang identifyingtheriskfactorsofallergicrhinitisbasedonzhihucommentdatausingatopicenhancedwordembeddingmodelmixedmethodstudyandclusteranalysis