A claims‐based, machine‐learning algorithm to identify patients with pulmonary arterial hypertension

Abstract Many patients with pulmonary arterial hypertension (PAH) experience substantial delays in diagnosis, which is associated with worse outcomes and higher costs. Tools for diagnosing PAH sooner may lead to earlier treatment, which may delay disease progression and adverse outcomes including ho...

Full description

Bibliographic Details
Main Authors: Bethany Hyde, Carly J. Paoli, Sumeet Panjabi, Katherine C. Bettencourt, Karimah S. Bell Lynum, Mona Selej
Format: Article
Language:English
Published: Wiley 2023-04-01
Series:Pulmonary Circulation
Subjects:
Online Access:https://doi.org/10.1002/pul2.12237
_version_ 1797793854606278656
author Bethany Hyde
Carly J. Paoli
Sumeet Panjabi
Katherine C. Bettencourt
Karimah S. Bell Lynum
Mona Selej
author_facet Bethany Hyde
Carly J. Paoli
Sumeet Panjabi
Katherine C. Bettencourt
Karimah S. Bell Lynum
Mona Selej
author_sort Bethany Hyde
collection DOAJ
description Abstract Many patients with pulmonary arterial hypertension (PAH) experience substantial delays in diagnosis, which is associated with worse outcomes and higher costs. Tools for diagnosing PAH sooner may lead to earlier treatment, which may delay disease progression and adverse outcomes including hospitalization and death. We developed a machine‐learning (ML) algorithm to identify patients at risk for PAH earlier in their symptom journey and distinguish them from patients with similar early symptoms not at risk for developing PAH. Our supervised ML model analyzed retrospective, de‐identified data from the US‐based Optum® Clinformatics® Data Mart claims database (January 2015 to December 2019). Propensity score matched PAH and non‐PAH (control) cohorts were established based on observed differences. Random forest models were used to classify patients as PAH or non‐PAH at diagnosis and at 6 months prediagnosis. The PAH and non‐PAH cohorts included 1339 and 4222 patients, respectively. At 6 months prediagnosis, the model performed well in distinguishing PAH and non‐PAH patients, with area under the curve of the receiver operating characteristic of 0.84, recall (sensitivity) of 0.73, and precision of 0.50. Key features distinguishing PAH from non‐PAH cohorts were a longer time between first symptom and the prediagnosis model date (i.e., 6 months before diagnosis); more diagnostic and prescription claims, circulatory claims, and imaging procedures, leading to higher overall healthcare resource utilization; and more hospitalizations. Our model distinguishes between patients with and without PAH at 6 months before diagnosis and illustrates the feasibility of using routine claims data to identify patients at a population level who might benefit from PAH‐specific screening and/or earlier specialist referral.
first_indexed 2024-03-13T02:54:17Z
format Article
id doaj.art-342a74dda5fd48bca1f83585d3436964
institution Directory Open Access Journal
issn 2045-8940
language English
last_indexed 2024-03-13T02:54:17Z
publishDate 2023-04-01
publisher Wiley
record_format Article
series Pulmonary Circulation
spelling doaj.art-342a74dda5fd48bca1f83585d34369642023-06-28T07:04:36ZengWileyPulmonary Circulation2045-89402023-04-01132n/an/a10.1002/pul2.12237A claims‐based, machine‐learning algorithm to identify patients with pulmonary arterial hypertensionBethany Hyde0Carly J. Paoli1Sumeet Panjabi2Katherine C. Bettencourt3Karimah S. Bell Lynum4Mona Selej5Janssen Business Technology Commercial Data Insights & Data Science Titusville New Jersey USAJanssen Scientific Affairs, Inc. Titusville New Jersey USAJanssen Scientific Affairs, Inc. Titusville New Jersey USAActelion Pharmaceuticals US, Inc. Titusville New Jersey USAActelion Pharmaceuticals US, Inc. Titusville New Jersey USAJanssen R&D Data Science South San Francisco California USAAbstract Many patients with pulmonary arterial hypertension (PAH) experience substantial delays in diagnosis, which is associated with worse outcomes and higher costs. Tools for diagnosing PAH sooner may lead to earlier treatment, which may delay disease progression and adverse outcomes including hospitalization and death. We developed a machine‐learning (ML) algorithm to identify patients at risk for PAH earlier in their symptom journey and distinguish them from patients with similar early symptoms not at risk for developing PAH. Our supervised ML model analyzed retrospective, de‐identified data from the US‐based Optum® Clinformatics® Data Mart claims database (January 2015 to December 2019). Propensity score matched PAH and non‐PAH (control) cohorts were established based on observed differences. Random forest models were used to classify patients as PAH or non‐PAH at diagnosis and at 6 months prediagnosis. The PAH and non‐PAH cohorts included 1339 and 4222 patients, respectively. At 6 months prediagnosis, the model performed well in distinguishing PAH and non‐PAH patients, with area under the curve of the receiver operating characteristic of 0.84, recall (sensitivity) of 0.73, and precision of 0.50. Key features distinguishing PAH from non‐PAH cohorts were a longer time between first symptom and the prediagnosis model date (i.e., 6 months before diagnosis); more diagnostic and prescription claims, circulatory claims, and imaging procedures, leading to higher overall healthcare resource utilization; and more hospitalizations. Our model distinguishes between patients with and without PAH at 6 months before diagnosis and illustrates the feasibility of using routine claims data to identify patients at a population level who might benefit from PAH‐specific screening and/or earlier specialist referral.https://doi.org/10.1002/pul2.12237early diagnosisrare diseasereal‐world evidence
spellingShingle Bethany Hyde
Carly J. Paoli
Sumeet Panjabi
Katherine C. Bettencourt
Karimah S. Bell Lynum
Mona Selej
A claims‐based, machine‐learning algorithm to identify patients with pulmonary arterial hypertension
Pulmonary Circulation
early diagnosis
rare disease
real‐world evidence
title A claims‐based, machine‐learning algorithm to identify patients with pulmonary arterial hypertension
title_full A claims‐based, machine‐learning algorithm to identify patients with pulmonary arterial hypertension
title_fullStr A claims‐based, machine‐learning algorithm to identify patients with pulmonary arterial hypertension
title_full_unstemmed A claims‐based, machine‐learning algorithm to identify patients with pulmonary arterial hypertension
title_short A claims‐based, machine‐learning algorithm to identify patients with pulmonary arterial hypertension
title_sort claims based machine learning algorithm to identify patients with pulmonary arterial hypertension
topic early diagnosis
rare disease
real‐world evidence
url https://doi.org/10.1002/pul2.12237
work_keys_str_mv AT bethanyhyde aclaimsbasedmachinelearningalgorithmtoidentifypatientswithpulmonaryarterialhypertension
AT carlyjpaoli aclaimsbasedmachinelearningalgorithmtoidentifypatientswithpulmonaryarterialhypertension
AT sumeetpanjabi aclaimsbasedmachinelearningalgorithmtoidentifypatientswithpulmonaryarterialhypertension
AT katherinecbettencourt aclaimsbasedmachinelearningalgorithmtoidentifypatientswithpulmonaryarterialhypertension
AT karimahsbelllynum aclaimsbasedmachinelearningalgorithmtoidentifypatientswithpulmonaryarterialhypertension
AT monaselej aclaimsbasedmachinelearningalgorithmtoidentifypatientswithpulmonaryarterialhypertension
AT bethanyhyde claimsbasedmachinelearningalgorithmtoidentifypatientswithpulmonaryarterialhypertension
AT carlyjpaoli claimsbasedmachinelearningalgorithmtoidentifypatientswithpulmonaryarterialhypertension
AT sumeetpanjabi claimsbasedmachinelearningalgorithmtoidentifypatientswithpulmonaryarterialhypertension
AT katherinecbettencourt claimsbasedmachinelearningalgorithmtoidentifypatientswithpulmonaryarterialhypertension
AT karimahsbelllynum claimsbasedmachinelearningalgorithmtoidentifypatientswithpulmonaryarterialhypertension
AT monaselej claimsbasedmachinelearningalgorithmtoidentifypatientswithpulmonaryarterialhypertension