A machine-learning algorithm using claims data to identify patients with homozygous familial hypercholesterolemia
Abstract Homozygous familial hypercholesterolemia (HoFH) is an underdiagnosed and undertreated ultra-rare disease. We utilized claims data from the Komodo Healthcare Map database to develop a machine-learning model to identify potential HoFH patients. We tokenized patients enrolled in MyRARE (patien...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2024-04-01
|
Series: | Scientific Reports |
Online Access: | https://doi.org/10.1038/s41598-024-58719-y |
_version_ | 1797199448352227328 |
---|---|
author | Jing Gu Matthew Epland Xinshuo Ma Jina Park Robert J. Sanchez Ying Li |
author_facet | Jing Gu Matthew Epland Xinshuo Ma Jina Park Robert J. Sanchez Ying Li |
author_sort | Jing Gu |
collection | DOAJ |
description | Abstract Homozygous familial hypercholesterolemia (HoFH) is an underdiagnosed and undertreated ultra-rare disease. We utilized claims data from the Komodo Healthcare Map database to develop a machine-learning model to identify potential HoFH patients. We tokenized patients enrolled in MyRARE (patient support program for those prescribed evinacumab-dgnb in the United States) and linked them with their Komodo claims. A true positive HoFH cohort (n = 331) was formed by including patients from MyRARE and patients with prescriptions for evinacumab-dgnb or lomitapide. The negative cohort (n = 1423) comprised patients with or at risk for cardiovascular disease. We divided the cohort into an 80% training and 20% testing set. Overall, 10,616 candidate features were investigated; 87 were selected due to clinical relevance and importance on prediction performance. Different machine-learning algorithms were explored, with fast interpretable greedy-tree sums selected as the final machine-learning tool. This selection was based on its satisfactory performance and its easily interpretable nature. The model identified four useful features and yielded precision (positive predicted value) of 0.98, recall (sensitivity) of 0.88, area under the receiver operating characteristic curve of 0.98, and accuracy of 0.97. The model performed well in identifying HoFH patients in the testing set, providing a useful tool to facilitate HoFH screening and diagnosis via healthcare claims data. |
first_indexed | 2024-04-24T07:15:55Z |
format | Article |
id | doaj.art-23330385ab674ae98c9062e6d60478c9 |
institution | Directory Open Access Journal |
issn | 2045-2322 |
language | English |
last_indexed | 2024-04-24T07:15:55Z |
publishDate | 2024-04-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Reports |
spelling | doaj.art-23330385ab674ae98c9062e6d60478c92024-04-21T11:15:57ZengNature PortfolioScientific Reports2045-23222024-04-011411810.1038/s41598-024-58719-yA machine-learning algorithm using claims data to identify patients with homozygous familial hypercholesterolemiaJing Gu0Matthew Epland1Xinshuo Ma2Jina Park3Robert J. Sanchez4Ying Li5Regeneron Pharmaceuticals, Inc.Komodo HealthKomodo HealthKomodo HealthRegeneron Pharmaceuticals, Inc.Regeneron Pharmaceuticals, Inc.Abstract Homozygous familial hypercholesterolemia (HoFH) is an underdiagnosed and undertreated ultra-rare disease. We utilized claims data from the Komodo Healthcare Map database to develop a machine-learning model to identify potential HoFH patients. We tokenized patients enrolled in MyRARE (patient support program for those prescribed evinacumab-dgnb in the United States) and linked them with their Komodo claims. A true positive HoFH cohort (n = 331) was formed by including patients from MyRARE and patients with prescriptions for evinacumab-dgnb or lomitapide. The negative cohort (n = 1423) comprised patients with or at risk for cardiovascular disease. We divided the cohort into an 80% training and 20% testing set. Overall, 10,616 candidate features were investigated; 87 were selected due to clinical relevance and importance on prediction performance. Different machine-learning algorithms were explored, with fast interpretable greedy-tree sums selected as the final machine-learning tool. This selection was based on its satisfactory performance and its easily interpretable nature. The model identified four useful features and yielded precision (positive predicted value) of 0.98, recall (sensitivity) of 0.88, area under the receiver operating characteristic curve of 0.98, and accuracy of 0.97. The model performed well in identifying HoFH patients in the testing set, providing a useful tool to facilitate HoFH screening and diagnosis via healthcare claims data.https://doi.org/10.1038/s41598-024-58719-y |
spellingShingle | Jing Gu Matthew Epland Xinshuo Ma Jina Park Robert J. Sanchez Ying Li A machine-learning algorithm using claims data to identify patients with homozygous familial hypercholesterolemia Scientific Reports |
title | A machine-learning algorithm using claims data to identify patients with homozygous familial hypercholesterolemia |
title_full | A machine-learning algorithm using claims data to identify patients with homozygous familial hypercholesterolemia |
title_fullStr | A machine-learning algorithm using claims data to identify patients with homozygous familial hypercholesterolemia |
title_full_unstemmed | A machine-learning algorithm using claims data to identify patients with homozygous familial hypercholesterolemia |
title_short | A machine-learning algorithm using claims data to identify patients with homozygous familial hypercholesterolemia |
title_sort | machine learning algorithm using claims data to identify patients with homozygous familial hypercholesterolemia |
url | https://doi.org/10.1038/s41598-024-58719-y |
work_keys_str_mv | AT jinggu amachinelearningalgorithmusingclaimsdatatoidentifypatientswithhomozygousfamilialhypercholesterolemia AT matthewepland amachinelearningalgorithmusingclaimsdatatoidentifypatientswithhomozygousfamilialhypercholesterolemia AT xinshuoma amachinelearningalgorithmusingclaimsdatatoidentifypatientswithhomozygousfamilialhypercholesterolemia AT jinapark amachinelearningalgorithmusingclaimsdatatoidentifypatientswithhomozygousfamilialhypercholesterolemia AT robertjsanchez amachinelearningalgorithmusingclaimsdatatoidentifypatientswithhomozygousfamilialhypercholesterolemia AT yingli amachinelearningalgorithmusingclaimsdatatoidentifypatientswithhomozygousfamilialhypercholesterolemia AT jinggu machinelearningalgorithmusingclaimsdatatoidentifypatientswithhomozygousfamilialhypercholesterolemia AT matthewepland machinelearningalgorithmusingclaimsdatatoidentifypatientswithhomozygousfamilialhypercholesterolemia AT xinshuoma machinelearningalgorithmusingclaimsdatatoidentifypatientswithhomozygousfamilialhypercholesterolemia AT jinapark machinelearningalgorithmusingclaimsdatatoidentifypatientswithhomozygousfamilialhypercholesterolemia AT robertjsanchez machinelearningalgorithmusingclaimsdatatoidentifypatientswithhomozygousfamilialhypercholesterolemia AT yingli machinelearningalgorithmusingclaimsdatatoidentifypatientswithhomozygousfamilialhypercholesterolemia |