A machine-learning algorithm using claims data to identify patients with homozygous familial hypercholesterolemia

Abstract Homozygous familial hypercholesterolemia (HoFH) is an underdiagnosed and undertreated ultra-rare disease. We utilized claims data from the Komodo Healthcare Map database to develop a machine-learning model to identify potential HoFH patients. We tokenized patients enrolled in MyRARE (patien...

Full description

Bibliographic Details
Main Authors: Jing Gu, Matthew Epland, Xinshuo Ma, Jina Park, Robert J. Sanchez, Ying Li
Format: Article
Language:English
Published: Nature Portfolio 2024-04-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-024-58719-y
_version_ 1797199448352227328
author Jing Gu
Matthew Epland
Xinshuo Ma
Jina Park
Robert J. Sanchez
Ying Li
author_facet Jing Gu
Matthew Epland
Xinshuo Ma
Jina Park
Robert J. Sanchez
Ying Li
author_sort Jing Gu
collection DOAJ
description Abstract Homozygous familial hypercholesterolemia (HoFH) is an underdiagnosed and undertreated ultra-rare disease. We utilized claims data from the Komodo Healthcare Map database to develop a machine-learning model to identify potential HoFH patients. We tokenized patients enrolled in MyRARE (patient support program for those prescribed evinacumab-dgnb in the United States) and linked them with their Komodo claims. A true positive HoFH cohort (n = 331) was formed by including patients from MyRARE and patients with prescriptions for evinacumab-dgnb or lomitapide. The negative cohort (n = 1423) comprised patients with or at risk for cardiovascular disease. We divided the cohort into an 80% training and 20% testing set. Overall, 10,616 candidate features were investigated; 87 were selected due to clinical relevance and importance on prediction performance. Different machine-learning algorithms were explored, with fast interpretable greedy-tree sums selected as the final machine-learning tool. This selection was based on its satisfactory performance and its easily interpretable nature. The model identified four useful features and yielded precision (positive predicted value) of 0.98, recall (sensitivity) of 0.88, area under the receiver operating characteristic curve of 0.98, and accuracy of 0.97. The model performed well in identifying HoFH patients in the testing set, providing a useful tool to facilitate HoFH screening and diagnosis via healthcare claims data.
first_indexed 2024-04-24T07:15:55Z
format Article
id doaj.art-23330385ab674ae98c9062e6d60478c9
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-04-24T07:15:55Z
publishDate 2024-04-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-23330385ab674ae98c9062e6d60478c92024-04-21T11:15:57ZengNature PortfolioScientific Reports2045-23222024-04-011411810.1038/s41598-024-58719-yA machine-learning algorithm using claims data to identify patients with homozygous familial hypercholesterolemiaJing Gu0Matthew Epland1Xinshuo Ma2Jina Park3Robert J. Sanchez4Ying Li5Regeneron Pharmaceuticals, Inc.Komodo HealthKomodo HealthKomodo HealthRegeneron Pharmaceuticals, Inc.Regeneron Pharmaceuticals, Inc.Abstract Homozygous familial hypercholesterolemia (HoFH) is an underdiagnosed and undertreated ultra-rare disease. We utilized claims data from the Komodo Healthcare Map database to develop a machine-learning model to identify potential HoFH patients. We tokenized patients enrolled in MyRARE (patient support program for those prescribed evinacumab-dgnb in the United States) and linked them with their Komodo claims. A true positive HoFH cohort (n = 331) was formed by including patients from MyRARE and patients with prescriptions for evinacumab-dgnb or lomitapide. The negative cohort (n = 1423) comprised patients with or at risk for cardiovascular disease. We divided the cohort into an 80% training and 20% testing set. Overall, 10,616 candidate features were investigated; 87 were selected due to clinical relevance and importance on prediction performance. Different machine-learning algorithms were explored, with fast interpretable greedy-tree sums selected as the final machine-learning tool. This selection was based on its satisfactory performance and its easily interpretable nature. The model identified four useful features and yielded precision (positive predicted value) of 0.98, recall (sensitivity) of 0.88, area under the receiver operating characteristic curve of 0.98, and accuracy of 0.97. The model performed well in identifying HoFH patients in the testing set, providing a useful tool to facilitate HoFH screening and diagnosis via healthcare claims data.https://doi.org/10.1038/s41598-024-58719-y
spellingShingle Jing Gu
Matthew Epland
Xinshuo Ma
Jina Park
Robert J. Sanchez
Ying Li
A machine-learning algorithm using claims data to identify patients with homozygous familial hypercholesterolemia
Scientific Reports
title A machine-learning algorithm using claims data to identify patients with homozygous familial hypercholesterolemia
title_full A machine-learning algorithm using claims data to identify patients with homozygous familial hypercholesterolemia
title_fullStr A machine-learning algorithm using claims data to identify patients with homozygous familial hypercholesterolemia
title_full_unstemmed A machine-learning algorithm using claims data to identify patients with homozygous familial hypercholesterolemia
title_short A machine-learning algorithm using claims data to identify patients with homozygous familial hypercholesterolemia
title_sort machine learning algorithm using claims data to identify patients with homozygous familial hypercholesterolemia
url https://doi.org/10.1038/s41598-024-58719-y
work_keys_str_mv AT jinggu amachinelearningalgorithmusingclaimsdatatoidentifypatientswithhomozygousfamilialhypercholesterolemia
AT matthewepland amachinelearningalgorithmusingclaimsdatatoidentifypatientswithhomozygousfamilialhypercholesterolemia
AT xinshuoma amachinelearningalgorithmusingclaimsdatatoidentifypatientswithhomozygousfamilialhypercholesterolemia
AT jinapark amachinelearningalgorithmusingclaimsdatatoidentifypatientswithhomozygousfamilialhypercholesterolemia
AT robertjsanchez amachinelearningalgorithmusingclaimsdatatoidentifypatientswithhomozygousfamilialhypercholesterolemia
AT yingli amachinelearningalgorithmusingclaimsdatatoidentifypatientswithhomozygousfamilialhypercholesterolemia
AT jinggu machinelearningalgorithmusingclaimsdatatoidentifypatientswithhomozygousfamilialhypercholesterolemia
AT matthewepland machinelearningalgorithmusingclaimsdatatoidentifypatientswithhomozygousfamilialhypercholesterolemia
AT xinshuoma machinelearningalgorithmusingclaimsdatatoidentifypatientswithhomozygousfamilialhypercholesterolemia
AT jinapark machinelearningalgorithmusingclaimsdatatoidentifypatientswithhomozygousfamilialhypercholesterolemia
AT robertjsanchez machinelearningalgorithmusingclaimsdatatoidentifypatientswithhomozygousfamilialhypercholesterolemia
AT yingli machinelearningalgorithmusingclaimsdatatoidentifypatientswithhomozygousfamilialhypercholesterolemia