Machine learning models to predict disease progression among veterans with hepatitis C virus.

<h4>Background</h4>Machine learning (ML) algorithms provide effective ways to build prediction models using longitudinal information given their capacity to incorporate numerous predictor variables without compromising the accuracy of the risk prediction. Clinical risk prediction models...

Full description

Bibliographic Details
Main Authors: Monica A Konerman, Lauren A Beste, Tony Van, Boang Liu, Xuefei Zhang, Ji Zhu, Sameer D Saini, Grace L Su, Brahmajee K Nallamothu, George N Ioannou, Akbar K Waljee
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2019-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0208141
_version_ 1798024961446641664
author Monica A Konerman
Lauren A Beste
Tony Van
Boang Liu
Xuefei Zhang
Ji Zhu
Sameer D Saini
Grace L Su
Brahmajee K Nallamothu
George N Ioannou
Akbar K Waljee
author_facet Monica A Konerman
Lauren A Beste
Tony Van
Boang Liu
Xuefei Zhang
Ji Zhu
Sameer D Saini
Grace L Su
Brahmajee K Nallamothu
George N Ioannou
Akbar K Waljee
author_sort Monica A Konerman
collection DOAJ
description <h4>Background</h4>Machine learning (ML) algorithms provide effective ways to build prediction models using longitudinal information given their capacity to incorporate numerous predictor variables without compromising the accuracy of the risk prediction. Clinical risk prediction models in chronic hepatitis C virus (CHC) can be challenging due to non-linear nature of disease progression. We developed and compared two ML algorithms to predict cirrhosis development in a large CHC-infected cohort using longitudinal data.<h4>Methods and findings</h4>We used national Veterans Health Administration (VHA) data to identify CHC patients in care between 2000-2016. The primary outcome was cirrhosis development ascertained by two consecutive aspartate aminotransferase (AST)-to-platelet ratio indexes (APRIs) > 2 after time zero given the infrequency of liver biopsy in clinical practice and that APRI is a validated non-invasive biomarker of fibrosis in CHC. We excluded those with initial APRI > 2 or pre-existing diagnosis of cirrhosis, hepatocellular carcinoma or hepatic decompensation. Enrollment was defined as the date of the first APRI. Time zero was defined as 2 years after enrollment. Cross-sectional (CS) models used predictors at or closest before time zero as a comparison. Longitudinal models used CS predictors plus longitudinal summary variables (maximum, minimum, maximum of slope, minimum of slope and total variation) between enrollment and time zero. Covariates included demographics, labs, and body mass index. Model performance was evaluated using concordance and area under the receiver operating curve (AuROC). A total of 72,683 individuals with CHC were analyzed with the cohort having a mean age of 52.8, 96.8% male and 53% white. There are 11,616 individuals (16%) who met the primary outcome over a mean follow-up of 7 years. We found superior predictive performance for the longitudinal Cox model compared to the CS Cox model (concordance 0.764 vs 0.746), and for the longitudinal boosted-survival-tree model compared to the linear Cox model (concordance 0.774 vs 0.764). The accuracy of the longitudinal models at 1,3,5 years after time zero also showed superior performance compared to the CS model, based on AuROC.<h4>Conclusions</h4>Boosted-survival-tree based models using longitudinal information are statistically superior to cross-sectional or linear models for predicting development of cirrhosis in CHC, though all four models were highly accurate. Similar statistical methods could be applied to predict outcomes in other non-linear chronic disease states.
first_indexed 2024-04-11T18:12:19Z
format Article
id doaj.art-b76fb559e94243198753ba5237f302c7
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-04-11T18:12:19Z
publishDate 2019-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-b76fb559e94243198753ba5237f302c72022-12-22T04:10:06ZengPublic Library of Science (PLoS)PLoS ONE1932-62032019-01-01141e020814110.1371/journal.pone.0208141Machine learning models to predict disease progression among veterans with hepatitis C virus.Monica A KonermanLauren A BesteTony VanBoang LiuXuefei ZhangJi ZhuSameer D SainiGrace L SuBrahmajee K NallamothuGeorge N IoannouAkbar K Waljee<h4>Background</h4>Machine learning (ML) algorithms provide effective ways to build prediction models using longitudinal information given their capacity to incorporate numerous predictor variables without compromising the accuracy of the risk prediction. Clinical risk prediction models in chronic hepatitis C virus (CHC) can be challenging due to non-linear nature of disease progression. We developed and compared two ML algorithms to predict cirrhosis development in a large CHC-infected cohort using longitudinal data.<h4>Methods and findings</h4>We used national Veterans Health Administration (VHA) data to identify CHC patients in care between 2000-2016. The primary outcome was cirrhosis development ascertained by two consecutive aspartate aminotransferase (AST)-to-platelet ratio indexes (APRIs) > 2 after time zero given the infrequency of liver biopsy in clinical practice and that APRI is a validated non-invasive biomarker of fibrosis in CHC. We excluded those with initial APRI > 2 or pre-existing diagnosis of cirrhosis, hepatocellular carcinoma or hepatic decompensation. Enrollment was defined as the date of the first APRI. Time zero was defined as 2 years after enrollment. Cross-sectional (CS) models used predictors at or closest before time zero as a comparison. Longitudinal models used CS predictors plus longitudinal summary variables (maximum, minimum, maximum of slope, minimum of slope and total variation) between enrollment and time zero. Covariates included demographics, labs, and body mass index. Model performance was evaluated using concordance and area under the receiver operating curve (AuROC). A total of 72,683 individuals with CHC were analyzed with the cohort having a mean age of 52.8, 96.8% male and 53% white. There are 11,616 individuals (16%) who met the primary outcome over a mean follow-up of 7 years. We found superior predictive performance for the longitudinal Cox model compared to the CS Cox model (concordance 0.764 vs 0.746), and for the longitudinal boosted-survival-tree model compared to the linear Cox model (concordance 0.774 vs 0.764). The accuracy of the longitudinal models at 1,3,5 years after time zero also showed superior performance compared to the CS model, based on AuROC.<h4>Conclusions</h4>Boosted-survival-tree based models using longitudinal information are statistically superior to cross-sectional or linear models for predicting development of cirrhosis in CHC, though all four models were highly accurate. Similar statistical methods could be applied to predict outcomes in other non-linear chronic disease states.https://doi.org/10.1371/journal.pone.0208141
spellingShingle Monica A Konerman
Lauren A Beste
Tony Van
Boang Liu
Xuefei Zhang
Ji Zhu
Sameer D Saini
Grace L Su
Brahmajee K Nallamothu
George N Ioannou
Akbar K Waljee
Machine learning models to predict disease progression among veterans with hepatitis C virus.
PLoS ONE
title Machine learning models to predict disease progression among veterans with hepatitis C virus.
title_full Machine learning models to predict disease progression among veterans with hepatitis C virus.
title_fullStr Machine learning models to predict disease progression among veterans with hepatitis C virus.
title_full_unstemmed Machine learning models to predict disease progression among veterans with hepatitis C virus.
title_short Machine learning models to predict disease progression among veterans with hepatitis C virus.
title_sort machine learning models to predict disease progression among veterans with hepatitis c virus
url https://doi.org/10.1371/journal.pone.0208141
work_keys_str_mv AT monicaakonerman machinelearningmodelstopredictdiseaseprogressionamongveteranswithhepatitiscvirus
AT laurenabeste machinelearningmodelstopredictdiseaseprogressionamongveteranswithhepatitiscvirus
AT tonyvan machinelearningmodelstopredictdiseaseprogressionamongveteranswithhepatitiscvirus
AT boangliu machinelearningmodelstopredictdiseaseprogressionamongveteranswithhepatitiscvirus
AT xuefeizhang machinelearningmodelstopredictdiseaseprogressionamongveteranswithhepatitiscvirus
AT jizhu machinelearningmodelstopredictdiseaseprogressionamongveteranswithhepatitiscvirus
AT sameerdsaini machinelearningmodelstopredictdiseaseprogressionamongveteranswithhepatitiscvirus
AT gracelsu machinelearningmodelstopredictdiseaseprogressionamongveteranswithhepatitiscvirus
AT brahmajeeknallamothu machinelearningmodelstopredictdiseaseprogressionamongveteranswithhepatitiscvirus
AT georgenioannou machinelearningmodelstopredictdiseaseprogressionamongveteranswithhepatitiscvirus
AT akbarkwaljee machinelearningmodelstopredictdiseaseprogressionamongveteranswithhepatitiscvirus