A claims‐based, machine‐learning algorithm to identify patients with pulmonary arterial hypertension
Abstract Many patients with pulmonary arterial hypertension (PAH) experience substantial delays in diagnosis, which is associated with worse outcomes and higher costs. Tools for diagnosing PAH sooner may lead to earlier treatment, which may delay disease progression and adverse outcomes including ho...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2023-04-01
|
Series: | Pulmonary Circulation |
Subjects: | |
Online Access: | https://doi.org/10.1002/pul2.12237 |
_version_ | 1797793854606278656 |
---|---|
author | Bethany Hyde Carly J. Paoli Sumeet Panjabi Katherine C. Bettencourt Karimah S. Bell Lynum Mona Selej |
author_facet | Bethany Hyde Carly J. Paoli Sumeet Panjabi Katherine C. Bettencourt Karimah S. Bell Lynum Mona Selej |
author_sort | Bethany Hyde |
collection | DOAJ |
description | Abstract Many patients with pulmonary arterial hypertension (PAH) experience substantial delays in diagnosis, which is associated with worse outcomes and higher costs. Tools for diagnosing PAH sooner may lead to earlier treatment, which may delay disease progression and adverse outcomes including hospitalization and death. We developed a machine‐learning (ML) algorithm to identify patients at risk for PAH earlier in their symptom journey and distinguish them from patients with similar early symptoms not at risk for developing PAH. Our supervised ML model analyzed retrospective, de‐identified data from the US‐based Optum® Clinformatics® Data Mart claims database (January 2015 to December 2019). Propensity score matched PAH and non‐PAH (control) cohorts were established based on observed differences. Random forest models were used to classify patients as PAH or non‐PAH at diagnosis and at 6 months prediagnosis. The PAH and non‐PAH cohorts included 1339 and 4222 patients, respectively. At 6 months prediagnosis, the model performed well in distinguishing PAH and non‐PAH patients, with area under the curve of the receiver operating characteristic of 0.84, recall (sensitivity) of 0.73, and precision of 0.50. Key features distinguishing PAH from non‐PAH cohorts were a longer time between first symptom and the prediagnosis model date (i.e., 6 months before diagnosis); more diagnostic and prescription claims, circulatory claims, and imaging procedures, leading to higher overall healthcare resource utilization; and more hospitalizations. Our model distinguishes between patients with and without PAH at 6 months before diagnosis and illustrates the feasibility of using routine claims data to identify patients at a population level who might benefit from PAH‐specific screening and/or earlier specialist referral. |
first_indexed | 2024-03-13T02:54:17Z |
format | Article |
id | doaj.art-342a74dda5fd48bca1f83585d3436964 |
institution | Directory Open Access Journal |
issn | 2045-8940 |
language | English |
last_indexed | 2024-03-13T02:54:17Z |
publishDate | 2023-04-01 |
publisher | Wiley |
record_format | Article |
series | Pulmonary Circulation |
spelling | doaj.art-342a74dda5fd48bca1f83585d34369642023-06-28T07:04:36ZengWileyPulmonary Circulation2045-89402023-04-01132n/an/a10.1002/pul2.12237A claims‐based, machine‐learning algorithm to identify patients with pulmonary arterial hypertensionBethany Hyde0Carly J. Paoli1Sumeet Panjabi2Katherine C. Bettencourt3Karimah S. Bell Lynum4Mona Selej5Janssen Business Technology Commercial Data Insights & Data Science Titusville New Jersey USAJanssen Scientific Affairs, Inc. Titusville New Jersey USAJanssen Scientific Affairs, Inc. Titusville New Jersey USAActelion Pharmaceuticals US, Inc. Titusville New Jersey USAActelion Pharmaceuticals US, Inc. Titusville New Jersey USAJanssen R&D Data Science South San Francisco California USAAbstract Many patients with pulmonary arterial hypertension (PAH) experience substantial delays in diagnosis, which is associated with worse outcomes and higher costs. Tools for diagnosing PAH sooner may lead to earlier treatment, which may delay disease progression and adverse outcomes including hospitalization and death. We developed a machine‐learning (ML) algorithm to identify patients at risk for PAH earlier in their symptom journey and distinguish them from patients with similar early symptoms not at risk for developing PAH. Our supervised ML model analyzed retrospective, de‐identified data from the US‐based Optum® Clinformatics® Data Mart claims database (January 2015 to December 2019). Propensity score matched PAH and non‐PAH (control) cohorts were established based on observed differences. Random forest models were used to classify patients as PAH or non‐PAH at diagnosis and at 6 months prediagnosis. The PAH and non‐PAH cohorts included 1339 and 4222 patients, respectively. At 6 months prediagnosis, the model performed well in distinguishing PAH and non‐PAH patients, with area under the curve of the receiver operating characteristic of 0.84, recall (sensitivity) of 0.73, and precision of 0.50. Key features distinguishing PAH from non‐PAH cohorts were a longer time between first symptom and the prediagnosis model date (i.e., 6 months before diagnosis); more diagnostic and prescription claims, circulatory claims, and imaging procedures, leading to higher overall healthcare resource utilization; and more hospitalizations. Our model distinguishes between patients with and without PAH at 6 months before diagnosis and illustrates the feasibility of using routine claims data to identify patients at a population level who might benefit from PAH‐specific screening and/or earlier specialist referral.https://doi.org/10.1002/pul2.12237early diagnosisrare diseasereal‐world evidence |
spellingShingle | Bethany Hyde Carly J. Paoli Sumeet Panjabi Katherine C. Bettencourt Karimah S. Bell Lynum Mona Selej A claims‐based, machine‐learning algorithm to identify patients with pulmonary arterial hypertension Pulmonary Circulation early diagnosis rare disease real‐world evidence |
title | A claims‐based, machine‐learning algorithm to identify patients with pulmonary arterial hypertension |
title_full | A claims‐based, machine‐learning algorithm to identify patients with pulmonary arterial hypertension |
title_fullStr | A claims‐based, machine‐learning algorithm to identify patients with pulmonary arterial hypertension |
title_full_unstemmed | A claims‐based, machine‐learning algorithm to identify patients with pulmonary arterial hypertension |
title_short | A claims‐based, machine‐learning algorithm to identify patients with pulmonary arterial hypertension |
title_sort | claims based machine learning algorithm to identify patients with pulmonary arterial hypertension |
topic | early diagnosis rare disease real‐world evidence |
url | https://doi.org/10.1002/pul2.12237 |
work_keys_str_mv | AT bethanyhyde aclaimsbasedmachinelearningalgorithmtoidentifypatientswithpulmonaryarterialhypertension AT carlyjpaoli aclaimsbasedmachinelearningalgorithmtoidentifypatientswithpulmonaryarterialhypertension AT sumeetpanjabi aclaimsbasedmachinelearningalgorithmtoidentifypatientswithpulmonaryarterialhypertension AT katherinecbettencourt aclaimsbasedmachinelearningalgorithmtoidentifypatientswithpulmonaryarterialhypertension AT karimahsbelllynum aclaimsbasedmachinelearningalgorithmtoidentifypatientswithpulmonaryarterialhypertension AT monaselej aclaimsbasedmachinelearningalgorithmtoidentifypatientswithpulmonaryarterialhypertension AT bethanyhyde claimsbasedmachinelearningalgorithmtoidentifypatientswithpulmonaryarterialhypertension AT carlyjpaoli claimsbasedmachinelearningalgorithmtoidentifypatientswithpulmonaryarterialhypertension AT sumeetpanjabi claimsbasedmachinelearningalgorithmtoidentifypatientswithpulmonaryarterialhypertension AT katherinecbettencourt claimsbasedmachinelearningalgorithmtoidentifypatientswithpulmonaryarterialhypertension AT karimahsbelllynum claimsbasedmachinelearningalgorithmtoidentifypatientswithpulmonaryarterialhypertension AT monaselej claimsbasedmachinelearningalgorithmtoidentifypatientswithpulmonaryarterialhypertension |