An algorithm to identify cases of pulmonary arterial hypertension from the electronic medical record
Abstract Background Study of pulmonary arterial hypertension (PAH) in claims-based (CB) cohorts may facilitate understanding of disease epidemiology, however previous CB algorithms to identify PAH have had limited test characteristics. We hypothesized that machine learning algorithms (MLA) could acc...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2022-05-01
|
Series: | Respiratory Research |
Subjects: | |
Online Access: | https://doi.org/10.1186/s12931-022-02055-0 |
_version_ | 1818204655275999232 |
---|---|
author | Kyle P. Schuler Anna R. Hemnes Jeffrey Annis Eric Farber-Eger Brandon D. Lowery Stephen J. Halliday Evan L. Brittain |
author_facet | Kyle P. Schuler Anna R. Hemnes Jeffrey Annis Eric Farber-Eger Brandon D. Lowery Stephen J. Halliday Evan L. Brittain |
author_sort | Kyle P. Schuler |
collection | DOAJ |
description | Abstract Background Study of pulmonary arterial hypertension (PAH) in claims-based (CB) cohorts may facilitate understanding of disease epidemiology, however previous CB algorithms to identify PAH have had limited test characteristics. We hypothesized that machine learning algorithms (MLA) could accurately identify PAH in an CB cohort. Methods ICD-9/10 codes, CPT codes or PAH medications were used to screen an electronic medical record (EMR) for possible PAH. A subset (Development Cohort) was manually reviewed and adjudicated as PAH or “not PAH” and used to train and test MLAs. A second subset (Refinement Cohort) was manually reviewed and combined with the Development Cohort to make The Final Cohort, again divided into training and testing sets, with MLA characteristics defined on test set. The MLA was validated using an independent EMR cohort. Results 194 PAH and 786 “not PAH” in the Development Cohort trained and tested the initial MLA. In the Final Cohort test set, the final MLA sensitivity was 0.88, specificity was 0.93, positive predictive value was 0.89, and negative predictive value was 0.92. Persistence and strength of PAH medication use and CPT code for right heart catheterization were principal MLA features. Applying the MLA to the EMR cohort using a split cohort internal validation approach, we found 265 additional non-confirmed cases of suspected PAH that exhibited typical PAH demographics, comorbidities, hemodynamics. Conclusions We developed and validated a MLA using only CB features that identified PAH in the EMR with strong test characteristics. When deployed across an entire EMR, the MLA identified cases with known features of PAH. |
first_indexed | 2024-12-12T03:44:41Z |
format | Article |
id | doaj.art-f952db8ea96f4428a05307dab06006be |
institution | Directory Open Access Journal |
issn | 1465-993X |
language | English |
last_indexed | 2024-12-12T03:44:41Z |
publishDate | 2022-05-01 |
publisher | BMC |
record_format | Article |
series | Respiratory Research |
spelling | doaj.art-f952db8ea96f4428a05307dab06006be2022-12-22T00:39:36ZengBMCRespiratory Research1465-993X2022-05-0123111010.1186/s12931-022-02055-0An algorithm to identify cases of pulmonary arterial hypertension from the electronic medical recordKyle P. Schuler0Anna R. Hemnes1Jeffrey Annis2Eric Farber-Eger3Brandon D. Lowery4Stephen J. Halliday5Evan L. Brittain6Department of Internal Medicine, Vanderbilt University Medical CenterDivision of Allergy, Pulmonary and Critical Care Medicine, Vanderbilt University Medical CenterDivision of Cardiovascular Medicine, Vanderbilt Pulmonary Circulation CenterDivision of Cardiovascular Medicine, Vanderbilt Pulmonary Circulation CenterDivision of Cardiovascular Medicine, Vanderbilt Pulmonary Circulation CenterDivision of Pulmonary and Critical Care Medicine, University of Wisconsin School of Medicine and Public HealthDivision of Cardiovascular Medicine, Vanderbilt Pulmonary Circulation CenterAbstract Background Study of pulmonary arterial hypertension (PAH) in claims-based (CB) cohorts may facilitate understanding of disease epidemiology, however previous CB algorithms to identify PAH have had limited test characteristics. We hypothesized that machine learning algorithms (MLA) could accurately identify PAH in an CB cohort. Methods ICD-9/10 codes, CPT codes or PAH medications were used to screen an electronic medical record (EMR) for possible PAH. A subset (Development Cohort) was manually reviewed and adjudicated as PAH or “not PAH” and used to train and test MLAs. A second subset (Refinement Cohort) was manually reviewed and combined with the Development Cohort to make The Final Cohort, again divided into training and testing sets, with MLA characteristics defined on test set. The MLA was validated using an independent EMR cohort. Results 194 PAH and 786 “not PAH” in the Development Cohort trained and tested the initial MLA. In the Final Cohort test set, the final MLA sensitivity was 0.88, specificity was 0.93, positive predictive value was 0.89, and negative predictive value was 0.92. Persistence and strength of PAH medication use and CPT code for right heart catheterization were principal MLA features. Applying the MLA to the EMR cohort using a split cohort internal validation approach, we found 265 additional non-confirmed cases of suspected PAH that exhibited typical PAH demographics, comorbidities, hemodynamics. Conclusions We developed and validated a MLA using only CB features that identified PAH in the EMR with strong test characteristics. When deployed across an entire EMR, the MLA identified cases with known features of PAH.https://doi.org/10.1186/s12931-022-02055-0Pulmonary hypertensionMachine learningAlgorithm |
spellingShingle | Kyle P. Schuler Anna R. Hemnes Jeffrey Annis Eric Farber-Eger Brandon D. Lowery Stephen J. Halliday Evan L. Brittain An algorithm to identify cases of pulmonary arterial hypertension from the electronic medical record Respiratory Research Pulmonary hypertension Machine learning Algorithm |
title | An algorithm to identify cases of pulmonary arterial hypertension from the electronic medical record |
title_full | An algorithm to identify cases of pulmonary arterial hypertension from the electronic medical record |
title_fullStr | An algorithm to identify cases of pulmonary arterial hypertension from the electronic medical record |
title_full_unstemmed | An algorithm to identify cases of pulmonary arterial hypertension from the electronic medical record |
title_short | An algorithm to identify cases of pulmonary arterial hypertension from the electronic medical record |
title_sort | algorithm to identify cases of pulmonary arterial hypertension from the electronic medical record |
topic | Pulmonary hypertension Machine learning Algorithm |
url | https://doi.org/10.1186/s12931-022-02055-0 |
work_keys_str_mv | AT kylepschuler analgorithmtoidentifycasesofpulmonaryarterialhypertensionfromtheelectronicmedicalrecord AT annarhemnes analgorithmtoidentifycasesofpulmonaryarterialhypertensionfromtheelectronicmedicalrecord AT jeffreyannis analgorithmtoidentifycasesofpulmonaryarterialhypertensionfromtheelectronicmedicalrecord AT ericfarbereger analgorithmtoidentifycasesofpulmonaryarterialhypertensionfromtheelectronicmedicalrecord AT brandondlowery analgorithmtoidentifycasesofpulmonaryarterialhypertensionfromtheelectronicmedicalrecord AT stephenjhalliday analgorithmtoidentifycasesofpulmonaryarterialhypertensionfromtheelectronicmedicalrecord AT evanlbrittain analgorithmtoidentifycasesofpulmonaryarterialhypertensionfromtheelectronicmedicalrecord AT kylepschuler algorithmtoidentifycasesofpulmonaryarterialhypertensionfromtheelectronicmedicalrecord AT annarhemnes algorithmtoidentifycasesofpulmonaryarterialhypertensionfromtheelectronicmedicalrecord AT jeffreyannis algorithmtoidentifycasesofpulmonaryarterialhypertensionfromtheelectronicmedicalrecord AT ericfarbereger algorithmtoidentifycasesofpulmonaryarterialhypertensionfromtheelectronicmedicalrecord AT brandondlowery algorithmtoidentifycasesofpulmonaryarterialhypertensionfromtheelectronicmedicalrecord AT stephenjhalliday algorithmtoidentifycasesofpulmonaryarterialhypertensionfromtheelectronicmedicalrecord AT evanlbrittain algorithmtoidentifycasesofpulmonaryarterialhypertensionfromtheelectronicmedicalrecord |