Mining FDA drug labels using an unsupervised learning technique - topic modeling

Abstract Background The Food and Drug Administration (FDA) approved drug labels contain a broad array of information, ranging from adverse drug reactions (ADRs) to drug efficacy, risk-benefit consideration, and more. However, the labeling language used...

Full description

Bibliographic Details
Main Authors:	Xu Xiaowei, Fang Hong, Liu Zhichao, Bisgin Halil, Tong Weida
Format:	Article
Language:	English
Published:	BMC 2011-10-01
Series:	BMC Bioinformatics

_version_	1818229912515903488
author	Xu Xiaowei Fang Hong Liu Zhichao Bisgin Halil Tong Weida
author_facet	Xu Xiaowei Fang Hong Liu Zhichao Bisgin Halil Tong Weida
author_sort	Xu Xiaowei
collection	DOAJ
description	<p>Abstract</p> <p>Background</p> <p>The Food and Drug Administration (FDA) approved drug labels contain a broad array of information, ranging from adverse drug reactions (ADRs) to drug efficacy, risk-benefit consideration, and more. However, the labeling language used to describe these information is free text often containing ambiguous semantic descriptions, which poses a great challenge in retrieving useful information from the labeling text in a consistent and accurate fashion for comparative analysis across drugs. Consequently, this task has largely relied on the manual reading of the full text by experts, which is time consuming and labor intensive.</p> <p>Method</p> <p>In this study, a novel text mining method with unsupervised learning in nature, called topic modeling, was applied to the drug labeling with a goal of discovering “topics” that group drugs with similar safety concerns and/or therapeutic uses together. A total of 794 FDA-approved drug labels were used in this study. First, the three labeling sections (i.e., Boxed Warning, Warnings and Precautions, Adverse Reactions) of each drug label were processed by the Medical Dictionary for Regulatory Activities (MedDRA) to convert the free text of each label to the standard ADR terms. Next, the topic modeling approach with latent Dirichlet allocation (LDA) was applied to generate 100 topics, each associated with a set of drugs grouped together based on the probability analysis. Lastly, the efficacy of the topic modeling was evaluated based on known information about the therapeutic uses and safety data of drugs.</p> <p>Results</p> <p>The results demonstrate that drugs grouped by topics are associated with the same safety concerns and/or therapeutic uses with statistical significance (P<0.05). The identified topics have distinct context that can be directly linked to specific adverse events (e.g., liver injury or kidney injury) or therapeutic application (e.g., antiinfectives for systemic use). We were also able to identify potential adverse events that might arise from specific medications via topics.</p> <p>Conclusions</p> <p>The successful application of topic modeling on the FDA drug labeling demonstrates its potential utility as a hypothesis generation means to infer hidden relationships of concepts such as, in this study, drug safety and therapeutic use in the study of biomedical documents.</p>
first_indexed	2024-12-12T10:26:08Z
format	Article
id	doaj.art-f23cd13f8b9e411e8ddec7ade5d557f8
institution	Directory Open Access Journal
issn	1471-2105
language	English
last_indexed	2024-12-12T10:26:08Z
publishDate	2011-10-01
publisher	BMC
record_format	Article
series	BMC Bioinformatics
spelling	doaj.art-f23cd13f8b9e411e8ddec7ade5d557f82022-12-22T00:27:27ZengBMCBMC Bioinformatics1471-21052011-10-0112Suppl 10S1110.1186/1471-2105-12-S10-S11Mining FDA drug labels using an unsupervised learning technique - topic modelingXu XiaoweiFang HongLiu ZhichaoBisgin HalilTong Weida<p>Abstract</p> <p>Background</p> <p>The Food and Drug Administration (FDA) approved drug labels contain a broad array of information, ranging from adverse drug reactions (ADRs) to drug efficacy, risk-benefit consideration, and more. However, the labeling language used to describe these information is free text often containing ambiguous semantic descriptions, which poses a great challenge in retrieving useful information from the labeling text in a consistent and accurate fashion for comparative analysis across drugs. Consequently, this task has largely relied on the manual reading of the full text by experts, which is time consuming and labor intensive.</p> <p>Method</p> <p>In this study, a novel text mining method with unsupervised learning in nature, called topic modeling, was applied to the drug labeling with a goal of discovering “topics” that group drugs with similar safety concerns and/or therapeutic uses together. A total of 794 FDA-approved drug labels were used in this study. First, the three labeling sections (i.e., Boxed Warning, Warnings and Precautions, Adverse Reactions) of each drug label were processed by the Medical Dictionary for Regulatory Activities (MedDRA) to convert the free text of each label to the standard ADR terms. Next, the topic modeling approach with latent Dirichlet allocation (LDA) was applied to generate 100 topics, each associated with a set of drugs grouped together based on the probability analysis. Lastly, the efficacy of the topic modeling was evaluated based on known information about the therapeutic uses and safety data of drugs.</p> <p>Results</p> <p>The results demonstrate that drugs grouped by topics are associated with the same safety concerns and/or therapeutic uses with statistical significance (P<0.05). The identified topics have distinct context that can be directly linked to specific adverse events (e.g., liver injury or kidney injury) or therapeutic application (e.g., antiinfectives for systemic use). We were also able to identify potential adverse events that might arise from specific medications via topics.</p> <p>Conclusions</p> <p>The successful application of topic modeling on the FDA drug labeling demonstrates its potential utility as a hypothesis generation means to infer hidden relationships of concepts such as, in this study, drug safety and therapeutic use in the study of biomedical documents.</p>
spellingShingle	Xu Xiaowei Fang Hong Liu Zhichao Bisgin Halil Tong Weida Mining FDA drug labels using an unsupervised learning technique - topic modeling BMC Bioinformatics
title	Mining FDA drug labels using an unsupervised learning technique - topic modeling
title_full	Mining FDA drug labels using an unsupervised learning technique - topic modeling
title_fullStr	Mining FDA drug labels using an unsupervised learning technique - topic modeling
title_full_unstemmed	Mining FDA drug labels using an unsupervised learning technique - topic modeling
title_short	Mining FDA drug labels using an unsupervised learning technique - topic modeling
title_sort	mining fda drug labels using an unsupervised learning technique topic modeling
work_keys_str_mv	AT xuxiaowei miningfdadruglabelsusinganunsupervisedlearningtechniquetopicmodeling AT fanghong miningfdadruglabelsusinganunsupervisedlearningtechniquetopicmodeling AT liuzhichao miningfdadruglabelsusinganunsupervisedlearningtechniquetopicmodeling AT bisginhalil miningfdadruglabelsusinganunsupervisedlearningtechniquetopicmodeling AT tongweida miningfdadruglabelsusinganunsupervisedlearningtechniquetopicmodeling

Mining FDA drug labels using an unsupervised learning technique - topic modeling

Similar Items