Empirical analyses and simulations showed that different machine and statistical learning methods had differing performance for predicting blood pressure

Abstract Machine learning is increasingly being used to predict clinical outcomes. Most comparisons of different methods have been based on empirical analyses in specific datasets. We used Monte Carlo simulations to determine when machine learning methods perform better than statistical learning met...

Full description

Bibliographic Details
Main Authors:	Peter C. Austin, Frank E. Harrell, Douglas S. Lee, Ewout W. Steyerberg
Format:	Article
Language:	English
Published:	Nature Portfolio 2022-06-01
Series:	Scientific Reports
Online Access:	https://doi.org/10.1038/s41598-022-13015-5

_version_	1828724937514287104
author	Peter C. Austin Frank E. Harrell Douglas S. Lee Ewout W. Steyerberg
author_facet	Peter C. Austin Frank E. Harrell Douglas S. Lee Ewout W. Steyerberg
author_sort	Peter C. Austin
collection	DOAJ
description	Abstract Machine learning is increasingly being used to predict clinical outcomes. Most comparisons of different methods have been based on empirical analyses in specific datasets. We used Monte Carlo simulations to determine when machine learning methods perform better than statistical learning methods in a specific setting. We evaluated six learning methods: stochastic gradient boosting machines using trees as the base learners, random forests, artificial neural networks, the lasso, ridge regression, and linear regression estimated using ordinary least squares (OLS). Our simulations were informed by empirical analyses in patients with acute myocardial infarction (AMI) and congestive heart failure (CHF) and used six data-generating processes, each based on one of the six learning methods, to simulate continuous outcomes in the derivation and validation samples. The outcome was systolic blood pressure at hospital discharge, a continuous outcome. We applied the six learning methods in each of the simulated derivation samples and evaluated performance in the simulated validation samples. The primary observation was that neural networks tended to result in estimates with worse predictive accuracy than the other five methods in both disease samples and across all six data-generating processes. Boosted trees and OLS regression tended to perform well across a range of scenarios.
first_indexed	2024-04-12T13:18:28Z
format	Article
id	doaj.art-5ea96a4f98254d9581c5a4db9ccbd416
institution	Directory Open Access Journal
issn	2045-2322
language	English
last_indexed	2024-04-12T13:18:28Z
publishDate	2022-06-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj.art-5ea96a4f98254d9581c5a4db9ccbd4162022-12-22T03:31:36ZengNature PortfolioScientific Reports2045-23222022-06-0112111110.1038/s41598-022-13015-5Empirical analyses and simulations showed that different machine and statistical learning methods had differing performance for predicting blood pressurePeter C. Austin0Frank E. Harrell1Douglas S. Lee2Ewout W. Steyerberg3ICESDepartment of Biostatistics, Vanderbilt University School of MedicineICESDepartment of Biomedical Data Sciences, Leiden University Medical CentreAbstract Machine learning is increasingly being used to predict clinical outcomes. Most comparisons of different methods have been based on empirical analyses in specific datasets. We used Monte Carlo simulations to determine when machine learning methods perform better than statistical learning methods in a specific setting. We evaluated six learning methods: stochastic gradient boosting machines using trees as the base learners, random forests, artificial neural networks, the lasso, ridge regression, and linear regression estimated using ordinary least squares (OLS). Our simulations were informed by empirical analyses in patients with acute myocardial infarction (AMI) and congestive heart failure (CHF) and used six data-generating processes, each based on one of the six learning methods, to simulate continuous outcomes in the derivation and validation samples. The outcome was systolic blood pressure at hospital discharge, a continuous outcome. We applied the six learning methods in each of the simulated derivation samples and evaluated performance in the simulated validation samples. The primary observation was that neural networks tended to result in estimates with worse predictive accuracy than the other five methods in both disease samples and across all six data-generating processes. Boosted trees and OLS regression tended to perform well across a range of scenarios.https://doi.org/10.1038/s41598-022-13015-5
spellingShingle	Peter C. Austin Frank E. Harrell Douglas S. Lee Ewout W. Steyerberg Empirical analyses and simulations showed that different machine and statistical learning methods had differing performance for predicting blood pressure Scientific Reports
title	Empirical analyses and simulations showed that different machine and statistical learning methods had differing performance for predicting blood pressure
title_full	Empirical analyses and simulations showed that different machine and statistical learning methods had differing performance for predicting blood pressure
title_fullStr	Empirical analyses and simulations showed that different machine and statistical learning methods had differing performance for predicting blood pressure
title_full_unstemmed	Empirical analyses and simulations showed that different machine and statistical learning methods had differing performance for predicting blood pressure
title_short	Empirical analyses and simulations showed that different machine and statistical learning methods had differing performance for predicting blood pressure
title_sort	empirical analyses and simulations showed that different machine and statistical learning methods had differing performance for predicting blood pressure
url	https://doi.org/10.1038/s41598-022-13015-5
work_keys_str_mv	AT petercaustin empiricalanalysesandsimulationsshowedthatdifferentmachineandstatisticallearningmethodshaddifferingperformanceforpredictingbloodpressure AT frankeharrell empiricalanalysesandsimulationsshowedthatdifferentmachineandstatisticallearningmethodshaddifferingperformanceforpredictingbloodpressure AT douglasslee empiricalanalysesandsimulationsshowedthatdifferentmachineandstatisticallearningmethodshaddifferingperformanceforpredictingbloodpressure AT ewoutwsteyerberg empiricalanalysesandsimulationsshowedthatdifferentmachineandstatisticallearningmethodshaddifferingperformanceforpredictingbloodpressure

Empirical analyses and simulations showed that different machine and statistical learning methods had differing performance for predicting blood pressure

Similar Items