Methods to Develop an Electronic Medical Record Phenotype Algorithm to Compare the Risk of Coronary Artery Disease across 3 Chronic Disease Cohorts

Background Typically, algorithms to classify phenotypes using electronic medical record (EMR) data were developed to perform well in a specific patient population. There is increasing interest in analyses which can allow study of a specific outcome across different diseases. Such a study in the EMR...

Full description

Bibliographic Details
Main Authors: Liao, Katherine P., Ananthakrishnan, Ashwin N., Kumar, Vishesh, Xia, Zongqi, Cagan, Andrew, Gainer, Vivian S., Goryachev, Sergey, Chen, Pei, Savova, Guergana K., Agniel, Denis, Churchill, Susanne, Lee, Jaeyoung, Murphy, Shawn N., Plenge, Robert M., Szolovits, Peter, Kohane, Isaac, Shaw, Stanley Y., Karlson, Elizabeth W., Cai, Tianxi
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format: Article
Language:en_US
Published: Public Library of Science 2015
Online Access:http://hdl.handle.net/1721.1/99879
https://orcid.org/0000-0001-8411-6403
_version_ 1826215257677234176
author Liao, Katherine P.
Ananthakrishnan, Ashwin N.
Kumar, Vishesh
Xia, Zongqi
Cagan, Andrew
Gainer, Vivian S.
Goryachev, Sergey
Chen, Pei
Savova, Guergana K.
Agniel, Denis
Churchill, Susanne
Lee, Jaeyoung
Murphy, Shawn N.
Plenge, Robert M.
Szolovits, Peter
Kohane, Isaac
Shaw, Stanley Y.
Karlson, Elizabeth W.
Cai, Tianxi
author2 Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Liao, Katherine P.
Ananthakrishnan, Ashwin N.
Kumar, Vishesh
Xia, Zongqi
Cagan, Andrew
Gainer, Vivian S.
Goryachev, Sergey
Chen, Pei
Savova, Guergana K.
Agniel, Denis
Churchill, Susanne
Lee, Jaeyoung
Murphy, Shawn N.
Plenge, Robert M.
Szolovits, Peter
Kohane, Isaac
Shaw, Stanley Y.
Karlson, Elizabeth W.
Cai, Tianxi
author_sort Liao, Katherine P.
collection MIT
description Background Typically, algorithms to classify phenotypes using electronic medical record (EMR) data were developed to perform well in a specific patient population. There is increasing interest in analyses which can allow study of a specific outcome across different diseases. Such a study in the EMR would require an algorithm that can be applied across different patient populations. Our objectives were: (1) to develop an algorithm that would enable the study of coronary artery disease (CAD) across diverse patient populations; (2) to study the impact of adding narrative data extracted using natural language processing (NLP) in the algorithm. Additionally, we demonstrate how to implement CAD algorithm to compare risk across 3 chronic diseases in a preliminary study. Methods and Results We studied 3 established EMR based patient cohorts: diabetes mellitus (DM, n = 65,099), inflammatory bowel disease (IBD, n = 10,974), and rheumatoid arthritis (RA, n = 4,453) from two large academic centers. We developed a CAD algorithm using NLP in addition to structured data (e.g. ICD9 codes) in the RA cohort and validated it in the DM and IBD cohorts. The CAD algorithm using NLP in addition to structured data achieved specificity >95% with a positive predictive value (PPV) 90% in the training (RA) and validation sets (IBD and DM). The addition of NLP data improved the sensitivity for all cohorts, classifying an additional 17% of CAD subjects in IBD and 10% in DM while maintaining PPV of 90%. The algorithm classified 16,488 DM (26.1%), 457 IBD (4.2%), and 245 RA (5.0%) with CAD. In a cross-sectional analysis, CAD risk was 63% lower in RA and 68% lower in IBD compared to DM (p<0.0001) after adjusting for traditional cardiovascular risk factors. Conclusions We developed and validated a CAD algorithm that performed well across diverse patient populations. The addition of NLP into the CAD algorithm improved the sensitivity of the algorithm, particularly in cohorts where the prevalence of CAD was low. Preliminary data suggest that CAD risk was significantly lower in RA and IBD compared to DM.
first_indexed 2024-09-23T16:21:30Z
format Article
id mit-1721.1/99879
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T16:21:30Z
publishDate 2015
publisher Public Library of Science
record_format dspace
spelling mit-1721.1/998792022-09-29T19:37:59Z Methods to Develop an Electronic Medical Record Phenotype Algorithm to Compare the Risk of Coronary Artery Disease across 3 Chronic Disease Cohorts Liao, Katherine P. Ananthakrishnan, Ashwin N. Kumar, Vishesh Xia, Zongqi Cagan, Andrew Gainer, Vivian S. Goryachev, Sergey Chen, Pei Savova, Guergana K. Agniel, Denis Churchill, Susanne Lee, Jaeyoung Murphy, Shawn N. Plenge, Robert M. Szolovits, Peter Kohane, Isaac Shaw, Stanley Y. Karlson, Elizabeth W. Cai, Tianxi Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Szolovits, Peter Background Typically, algorithms to classify phenotypes using electronic medical record (EMR) data were developed to perform well in a specific patient population. There is increasing interest in analyses which can allow study of a specific outcome across different diseases. Such a study in the EMR would require an algorithm that can be applied across different patient populations. Our objectives were: (1) to develop an algorithm that would enable the study of coronary artery disease (CAD) across diverse patient populations; (2) to study the impact of adding narrative data extracted using natural language processing (NLP) in the algorithm. Additionally, we demonstrate how to implement CAD algorithm to compare risk across 3 chronic diseases in a preliminary study. Methods and Results We studied 3 established EMR based patient cohorts: diabetes mellitus (DM, n = 65,099), inflammatory bowel disease (IBD, n = 10,974), and rheumatoid arthritis (RA, n = 4,453) from two large academic centers. We developed a CAD algorithm using NLP in addition to structured data (e.g. ICD9 codes) in the RA cohort and validated it in the DM and IBD cohorts. The CAD algorithm using NLP in addition to structured data achieved specificity >95% with a positive predictive value (PPV) 90% in the training (RA) and validation sets (IBD and DM). The addition of NLP data improved the sensitivity for all cohorts, classifying an additional 17% of CAD subjects in IBD and 10% in DM while maintaining PPV of 90%. The algorithm classified 16,488 DM (26.1%), 457 IBD (4.2%), and 245 RA (5.0%) with CAD. In a cross-sectional analysis, CAD risk was 63% lower in RA and 68% lower in IBD compared to DM (p<0.0001) after adjusting for traditional cardiovascular risk factors. Conclusions We developed and validated a CAD algorithm that performed well across diverse patient populations. The addition of NLP into the CAD algorithm improved the sensitivity of the algorithm, particularly in cohorts where the prevalence of CAD was low. Preliminary data suggest that CAD risk was significantly lower in RA and IBD compared to DM. National Institutes of Health (U.S.). Informatics for Integrating Biology and the Bedside Project (U54LM008748) 2015-11-10T16:23:06Z 2015-11-10T16:23:06Z 2015-08 2014-09 Article http://purl.org/eprint/type/JournalArticle 1932-6203 http://hdl.handle.net/1721.1/99879 Liao, Katherine P., Ashwin N. Ananthakrishnan, Vishesh Kumar, Zongqi Xia, Andrew Cagan, Vivian S. Gainer, Sergey Goryachev, et al. “Methods to Develop an Electronic Medical Record Phenotype Algorithm to Compare the Risk of Coronary Artery Disease across 3 Chronic Disease Cohorts.” Edited by Giorgos Bamias. PLOS ONE 10, no. 8 (August 24, 2015): e0136651. https://orcid.org/0000-0001-8411-6403 en_US http://dx.doi.org/10.1371/journal.pone.0136651 PLOS ONE Creative Commons Attribution http://creativecommons.org/licenses/by/4.0/ application/pdf Public Library of Science Public Library of Science
spellingShingle Liao, Katherine P.
Ananthakrishnan, Ashwin N.
Kumar, Vishesh
Xia, Zongqi
Cagan, Andrew
Gainer, Vivian S.
Goryachev, Sergey
Chen, Pei
Savova, Guergana K.
Agniel, Denis
Churchill, Susanne
Lee, Jaeyoung
Murphy, Shawn N.
Plenge, Robert M.
Szolovits, Peter
Kohane, Isaac
Shaw, Stanley Y.
Karlson, Elizabeth W.
Cai, Tianxi
Methods to Develop an Electronic Medical Record Phenotype Algorithm to Compare the Risk of Coronary Artery Disease across 3 Chronic Disease Cohorts
title Methods to Develop an Electronic Medical Record Phenotype Algorithm to Compare the Risk of Coronary Artery Disease across 3 Chronic Disease Cohorts
title_full Methods to Develop an Electronic Medical Record Phenotype Algorithm to Compare the Risk of Coronary Artery Disease across 3 Chronic Disease Cohorts
title_fullStr Methods to Develop an Electronic Medical Record Phenotype Algorithm to Compare the Risk of Coronary Artery Disease across 3 Chronic Disease Cohorts
title_full_unstemmed Methods to Develop an Electronic Medical Record Phenotype Algorithm to Compare the Risk of Coronary Artery Disease across 3 Chronic Disease Cohorts
title_short Methods to Develop an Electronic Medical Record Phenotype Algorithm to Compare the Risk of Coronary Artery Disease across 3 Chronic Disease Cohorts
title_sort methods to develop an electronic medical record phenotype algorithm to compare the risk of coronary artery disease across 3 chronic disease cohorts
url http://hdl.handle.net/1721.1/99879
https://orcid.org/0000-0001-8411-6403
work_keys_str_mv AT liaokatherinep methodstodevelopanelectronicmedicalrecordphenotypealgorithmtocomparetheriskofcoronaryarterydiseaseacross3chronicdiseasecohorts
AT ananthakrishnanashwinn methodstodevelopanelectronicmedicalrecordphenotypealgorithmtocomparetheriskofcoronaryarterydiseaseacross3chronicdiseasecohorts
AT kumarvishesh methodstodevelopanelectronicmedicalrecordphenotypealgorithmtocomparetheriskofcoronaryarterydiseaseacross3chronicdiseasecohorts
AT xiazongqi methodstodevelopanelectronicmedicalrecordphenotypealgorithmtocomparetheriskofcoronaryarterydiseaseacross3chronicdiseasecohorts
AT caganandrew methodstodevelopanelectronicmedicalrecordphenotypealgorithmtocomparetheriskofcoronaryarterydiseaseacross3chronicdiseasecohorts
AT gainervivians methodstodevelopanelectronicmedicalrecordphenotypealgorithmtocomparetheriskofcoronaryarterydiseaseacross3chronicdiseasecohorts
AT goryachevsergey methodstodevelopanelectronicmedicalrecordphenotypealgorithmtocomparetheriskofcoronaryarterydiseaseacross3chronicdiseasecohorts
AT chenpei methodstodevelopanelectronicmedicalrecordphenotypealgorithmtocomparetheriskofcoronaryarterydiseaseacross3chronicdiseasecohorts
AT savovaguerganak methodstodevelopanelectronicmedicalrecordphenotypealgorithmtocomparetheriskofcoronaryarterydiseaseacross3chronicdiseasecohorts
AT agnieldenis methodstodevelopanelectronicmedicalrecordphenotypealgorithmtocomparetheriskofcoronaryarterydiseaseacross3chronicdiseasecohorts
AT churchillsusanne methodstodevelopanelectronicmedicalrecordphenotypealgorithmtocomparetheriskofcoronaryarterydiseaseacross3chronicdiseasecohorts
AT leejaeyoung methodstodevelopanelectronicmedicalrecordphenotypealgorithmtocomparetheriskofcoronaryarterydiseaseacross3chronicdiseasecohorts
AT murphyshawnn methodstodevelopanelectronicmedicalrecordphenotypealgorithmtocomparetheriskofcoronaryarterydiseaseacross3chronicdiseasecohorts
AT plengerobertm methodstodevelopanelectronicmedicalrecordphenotypealgorithmtocomparetheriskofcoronaryarterydiseaseacross3chronicdiseasecohorts
AT szolovitspeter methodstodevelopanelectronicmedicalrecordphenotypealgorithmtocomparetheriskofcoronaryarterydiseaseacross3chronicdiseasecohorts
AT kohaneisaac methodstodevelopanelectronicmedicalrecordphenotypealgorithmtocomparetheriskofcoronaryarterydiseaseacross3chronicdiseasecohorts
AT shawstanleyy methodstodevelopanelectronicmedicalrecordphenotypealgorithmtocomparetheriskofcoronaryarterydiseaseacross3chronicdiseasecohorts
AT karlsonelizabethw methodstodevelopanelectronicmedicalrecordphenotypealgorithmtocomparetheriskofcoronaryarterydiseaseacross3chronicdiseasecohorts
AT caitianxi methodstodevelopanelectronicmedicalrecordphenotypealgorithmtocomparetheriskofcoronaryarterydiseaseacross3chronicdiseasecohorts