An algorithm to identify cases of pulmonary arterial hypertension from the electronic medical record

Abstract Background Study of pulmonary arterial hypertension (PAH) in claims-based (CB) cohorts may facilitate understanding of disease epidemiology, however previous CB algorithms to identify PAH have had limited test characteristics. We hypothesized that machine learning algorithms (MLA) could acc...

Full description

Bibliographic Details
Main Authors: Kyle P. Schuler, Anna R. Hemnes, Jeffrey Annis, Eric Farber-Eger, Brandon D. Lowery, Stephen J. Halliday, Evan L. Brittain
Format: Article
Language:English
Published: BMC 2022-05-01
Series:Respiratory Research
Subjects:
Online Access:https://doi.org/10.1186/s12931-022-02055-0
_version_ 1818204655275999232
author Kyle P. Schuler
Anna R. Hemnes
Jeffrey Annis
Eric Farber-Eger
Brandon D. Lowery
Stephen J. Halliday
Evan L. Brittain
author_facet Kyle P. Schuler
Anna R. Hemnes
Jeffrey Annis
Eric Farber-Eger
Brandon D. Lowery
Stephen J. Halliday
Evan L. Brittain
author_sort Kyle P. Schuler
collection DOAJ
description Abstract Background Study of pulmonary arterial hypertension (PAH) in claims-based (CB) cohorts may facilitate understanding of disease epidemiology, however previous CB algorithms to identify PAH have had limited test characteristics. We hypothesized that machine learning algorithms (MLA) could accurately identify PAH in an CB cohort. Methods ICD-9/10 codes, CPT codes or PAH medications were used to screen an electronic medical record (EMR) for possible PAH. A subset (Development Cohort) was manually reviewed and adjudicated as PAH or “not PAH” and used to train and test MLAs. A second subset (Refinement Cohort) was manually reviewed and combined with the Development Cohort to make The Final Cohort, again divided into training and testing sets, with MLA characteristics defined on test set. The MLA was validated using an independent EMR cohort. Results 194 PAH and 786 “not PAH” in the Development Cohort trained and tested the initial MLA. In the Final Cohort test set, the final MLA sensitivity was 0.88, specificity was 0.93, positive predictive value was 0.89, and negative predictive value was 0.92. Persistence and strength of PAH medication use and CPT code for right heart catheterization were principal MLA features. Applying the MLA to the EMR cohort using a split cohort internal validation approach, we found 265 additional non-confirmed cases of suspected PAH that exhibited typical PAH demographics, comorbidities, hemodynamics. Conclusions We developed and validated a MLA using only CB features that identified PAH in the EMR with strong test characteristics. When deployed across an entire EMR, the MLA identified cases with known features of PAH.
first_indexed 2024-12-12T03:44:41Z
format Article
id doaj.art-f952db8ea96f4428a05307dab06006be
institution Directory Open Access Journal
issn 1465-993X
language English
last_indexed 2024-12-12T03:44:41Z
publishDate 2022-05-01
publisher BMC
record_format Article
series Respiratory Research
spelling doaj.art-f952db8ea96f4428a05307dab06006be2022-12-22T00:39:36ZengBMCRespiratory Research1465-993X2022-05-0123111010.1186/s12931-022-02055-0An algorithm to identify cases of pulmonary arterial hypertension from the electronic medical recordKyle P. Schuler0Anna R. Hemnes1Jeffrey Annis2Eric Farber-Eger3Brandon D. Lowery4Stephen J. Halliday5Evan L. Brittain6Department of Internal Medicine, Vanderbilt University Medical CenterDivision of Allergy, Pulmonary and Critical Care Medicine, Vanderbilt University Medical CenterDivision of Cardiovascular Medicine, Vanderbilt Pulmonary Circulation CenterDivision of Cardiovascular Medicine, Vanderbilt Pulmonary Circulation CenterDivision of Cardiovascular Medicine, Vanderbilt Pulmonary Circulation CenterDivision of Pulmonary and Critical Care Medicine, University of Wisconsin School of Medicine and Public HealthDivision of Cardiovascular Medicine, Vanderbilt Pulmonary Circulation CenterAbstract Background Study of pulmonary arterial hypertension (PAH) in claims-based (CB) cohorts may facilitate understanding of disease epidemiology, however previous CB algorithms to identify PAH have had limited test characteristics. We hypothesized that machine learning algorithms (MLA) could accurately identify PAH in an CB cohort. Methods ICD-9/10 codes, CPT codes or PAH medications were used to screen an electronic medical record (EMR) for possible PAH. A subset (Development Cohort) was manually reviewed and adjudicated as PAH or “not PAH” and used to train and test MLAs. A second subset (Refinement Cohort) was manually reviewed and combined with the Development Cohort to make The Final Cohort, again divided into training and testing sets, with MLA characteristics defined on test set. The MLA was validated using an independent EMR cohort. Results 194 PAH and 786 “not PAH” in the Development Cohort trained and tested the initial MLA. In the Final Cohort test set, the final MLA sensitivity was 0.88, specificity was 0.93, positive predictive value was 0.89, and negative predictive value was 0.92. Persistence and strength of PAH medication use and CPT code for right heart catheterization were principal MLA features. Applying the MLA to the EMR cohort using a split cohort internal validation approach, we found 265 additional non-confirmed cases of suspected PAH that exhibited typical PAH demographics, comorbidities, hemodynamics. Conclusions We developed and validated a MLA using only CB features that identified PAH in the EMR with strong test characteristics. When deployed across an entire EMR, the MLA identified cases with known features of PAH.https://doi.org/10.1186/s12931-022-02055-0Pulmonary hypertensionMachine learningAlgorithm
spellingShingle Kyle P. Schuler
Anna R. Hemnes
Jeffrey Annis
Eric Farber-Eger
Brandon D. Lowery
Stephen J. Halliday
Evan L. Brittain
An algorithm to identify cases of pulmonary arterial hypertension from the electronic medical record
Respiratory Research
Pulmonary hypertension
Machine learning
Algorithm
title An algorithm to identify cases of pulmonary arterial hypertension from the electronic medical record
title_full An algorithm to identify cases of pulmonary arterial hypertension from the electronic medical record
title_fullStr An algorithm to identify cases of pulmonary arterial hypertension from the electronic medical record
title_full_unstemmed An algorithm to identify cases of pulmonary arterial hypertension from the electronic medical record
title_short An algorithm to identify cases of pulmonary arterial hypertension from the electronic medical record
title_sort algorithm to identify cases of pulmonary arterial hypertension from the electronic medical record
topic Pulmonary hypertension
Machine learning
Algorithm
url https://doi.org/10.1186/s12931-022-02055-0
work_keys_str_mv AT kylepschuler analgorithmtoidentifycasesofpulmonaryarterialhypertensionfromtheelectronicmedicalrecord
AT annarhemnes analgorithmtoidentifycasesofpulmonaryarterialhypertensionfromtheelectronicmedicalrecord
AT jeffreyannis analgorithmtoidentifycasesofpulmonaryarterialhypertensionfromtheelectronicmedicalrecord
AT ericfarbereger analgorithmtoidentifycasesofpulmonaryarterialhypertensionfromtheelectronicmedicalrecord
AT brandondlowery analgorithmtoidentifycasesofpulmonaryarterialhypertensionfromtheelectronicmedicalrecord
AT stephenjhalliday analgorithmtoidentifycasesofpulmonaryarterialhypertensionfromtheelectronicmedicalrecord
AT evanlbrittain analgorithmtoidentifycasesofpulmonaryarterialhypertensionfromtheelectronicmedicalrecord
AT kylepschuler algorithmtoidentifycasesofpulmonaryarterialhypertensionfromtheelectronicmedicalrecord
AT annarhemnes algorithmtoidentifycasesofpulmonaryarterialhypertensionfromtheelectronicmedicalrecord
AT jeffreyannis algorithmtoidentifycasesofpulmonaryarterialhypertensionfromtheelectronicmedicalrecord
AT ericfarbereger algorithmtoidentifycasesofpulmonaryarterialhypertensionfromtheelectronicmedicalrecord
AT brandondlowery algorithmtoidentifycasesofpulmonaryarterialhypertensionfromtheelectronicmedicalrecord
AT stephenjhalliday algorithmtoidentifycasesofpulmonaryarterialhypertensionfromtheelectronicmedicalrecord
AT evanlbrittain algorithmtoidentifycasesofpulmonaryarterialhypertensionfromtheelectronicmedicalrecord