Relative Performance of Machine Learning and Linear Regression in Predicting Quality of Life and Academic Performance of School Children in Norway: Data Analysis of a Quasi-Experimental Study

BackgroundMachine learning techniques are increasingly being applied in health research. It is not clear how useful these approaches are for modeling continuous outcomes. Child quality of life is associated with parental socioeconomic status and physical activity and may be a...

Full description

Bibliographic Details
Main Authors: Robert Froud, Solveig Hakestad Hansen, Hans Kristian Ruud, Jonathan Foss, Leila Ferguson, Per Morten Fredriksen
Format: Article
Language:English
Published: JMIR Publications 2021-07-01
Series:Journal of Medical Internet Research
Online Access:https://www.jmir.org/2021/7/e22021
_version_ 1797735845126471680
author Robert Froud
Solveig Hakestad Hansen
Hans Kristian Ruud
Jonathan Foss
Leila Ferguson
Per Morten Fredriksen
author_facet Robert Froud
Solveig Hakestad Hansen
Hans Kristian Ruud
Jonathan Foss
Leila Ferguson
Per Morten Fredriksen
author_sort Robert Froud
collection DOAJ
description BackgroundMachine learning techniques are increasingly being applied in health research. It is not clear how useful these approaches are for modeling continuous outcomes. Child quality of life is associated with parental socioeconomic status and physical activity and may be associated with aerobic fitness and strength. It is unclear whether diet or academic performance is associated with quality of life. ObjectiveThe purpose of this study was to compare the predictive performance of machine learning techniques with that of linear regression in examining the extent to which continuous outcomes (physical activity, aerobic fitness, muscular strength, diet, and parental education) are predictive of academic performance and quality of life and whether academic performance and quality of life are associated. MethodsWe modeled data from children attending 9 schools in a quasi-experimental study. We split data randomly into training and validation sets. Curvilinear, nonlinear, and heteroscedastic variables were simulated to examine the performance of machine learning techniques compared to that of linear models, with and without imputation. ResultsWe included data for 1711 children. Regression models explained 24% of academic performance variance in the real complete-case validation set, and up to 15% in quality of life. While machine learning techniques explained high proportions of variance in training sets, in validation, machine learning techniques explained approximately 0% of academic performance and 3% to 8% of quality of life. With imputation, machine learning techniques improved to 15% for academic performance. Machine learning outperformed regression for simulated nonlinear and heteroscedastic variables. The best predictors of academic performance in adjusted models were the child’s mother having a master-level education (P<.001; β=1.98, 95% CI 0.25 to 3.71), increased television and computer use (P=.03; β=1.19, 95% CI 0.25 to 3.71), and dichotomized self-reported exercise (P=.001; β=2.47, 95% CI 1.08 to 3.87). For quality of life, self-reported exercise (P<.001; β=1.09, 95% CI 0.53 to 1.66) and increased television and computer use (P=.002; β=−0.95, 95% CI −1.55 to −0.36) were the best predictors. Adjusted academic performance was associated with quality of life (P=.02; β=0.12, 95% CI 0.02 to 0.22). ConclusionsLinear regression was less prone to overfitting and outperformed commonly used machine learning techniques. Imputation improved the performance of machine learning, but not sufficiently to outperform regression. Machine learning techniques outperformed linear regression for modeling nonlinear and heteroscedastic relationships and may be of use in such cases. Regression with splines performed almost as well in nonlinear modeling. Lifestyle variables, including physical exercise, television and computer use, and parental education are predictive of academic performance or quality of life. Academic performance is associated with quality of life after adjusting for lifestyle variables and may offer another promising intervention target to improve quality of life in children.
first_indexed 2024-03-12T13:04:59Z
format Article
id doaj.art-00f0153687ac4be5938bd488db919c80
institution Directory Open Access Journal
issn 1438-8871
language English
last_indexed 2024-03-12T13:04:59Z
publishDate 2021-07-01
publisher JMIR Publications
record_format Article
series Journal of Medical Internet Research
spelling doaj.art-00f0153687ac4be5938bd488db919c802023-08-28T17:04:52ZengJMIR PublicationsJournal of Medical Internet Research1438-88712021-07-01237e2202110.2196/22021Relative Performance of Machine Learning and Linear Regression in Predicting Quality of Life and Academic Performance of School Children in Norway: Data Analysis of a Quasi-Experimental StudyRobert Froudhttps://orcid.org/0000-0002-9193-2297Solveig Hakestad Hansenhttps://orcid.org/0000-0001-5608-6219Hans Kristian Ruudhttps://orcid.org/0000-0003-1395-7680Jonathan Fosshttps://orcid.org/0000-0003-4106-5583Leila Fergusonhttps://orcid.org/0000-0001-9115-5662Per Morten Fredriksenhttps://orcid.org/0000-0001-7450-2925 BackgroundMachine learning techniques are increasingly being applied in health research. It is not clear how useful these approaches are for modeling continuous outcomes. Child quality of life is associated with parental socioeconomic status and physical activity and may be associated with aerobic fitness and strength. It is unclear whether diet or academic performance is associated with quality of life. ObjectiveThe purpose of this study was to compare the predictive performance of machine learning techniques with that of linear regression in examining the extent to which continuous outcomes (physical activity, aerobic fitness, muscular strength, diet, and parental education) are predictive of academic performance and quality of life and whether academic performance and quality of life are associated. MethodsWe modeled data from children attending 9 schools in a quasi-experimental study. We split data randomly into training and validation sets. Curvilinear, nonlinear, and heteroscedastic variables were simulated to examine the performance of machine learning techniques compared to that of linear models, with and without imputation. ResultsWe included data for 1711 children. Regression models explained 24% of academic performance variance in the real complete-case validation set, and up to 15% in quality of life. While machine learning techniques explained high proportions of variance in training sets, in validation, machine learning techniques explained approximately 0% of academic performance and 3% to 8% of quality of life. With imputation, machine learning techniques improved to 15% for academic performance. Machine learning outperformed regression for simulated nonlinear and heteroscedastic variables. The best predictors of academic performance in adjusted models were the child’s mother having a master-level education (P<.001; β=1.98, 95% CI 0.25 to 3.71), increased television and computer use (P=.03; β=1.19, 95% CI 0.25 to 3.71), and dichotomized self-reported exercise (P=.001; β=2.47, 95% CI 1.08 to 3.87). For quality of life, self-reported exercise (P<.001; β=1.09, 95% CI 0.53 to 1.66) and increased television and computer use (P=.002; β=−0.95, 95% CI −1.55 to −0.36) were the best predictors. Adjusted academic performance was associated with quality of life (P=.02; β=0.12, 95% CI 0.02 to 0.22). ConclusionsLinear regression was less prone to overfitting and outperformed commonly used machine learning techniques. Imputation improved the performance of machine learning, but not sufficiently to outperform regression. Machine learning techniques outperformed linear regression for modeling nonlinear and heteroscedastic relationships and may be of use in such cases. Regression with splines performed almost as well in nonlinear modeling. Lifestyle variables, including physical exercise, television and computer use, and parental education are predictive of academic performance or quality of life. Academic performance is associated with quality of life after adjusting for lifestyle variables and may offer another promising intervention target to improve quality of life in children.https://www.jmir.org/2021/7/e22021
spellingShingle Robert Froud
Solveig Hakestad Hansen
Hans Kristian Ruud
Jonathan Foss
Leila Ferguson
Per Morten Fredriksen
Relative Performance of Machine Learning and Linear Regression in Predicting Quality of Life and Academic Performance of School Children in Norway: Data Analysis of a Quasi-Experimental Study
Journal of Medical Internet Research
title Relative Performance of Machine Learning and Linear Regression in Predicting Quality of Life and Academic Performance of School Children in Norway: Data Analysis of a Quasi-Experimental Study
title_full Relative Performance of Machine Learning and Linear Regression in Predicting Quality of Life and Academic Performance of School Children in Norway: Data Analysis of a Quasi-Experimental Study
title_fullStr Relative Performance of Machine Learning and Linear Regression in Predicting Quality of Life and Academic Performance of School Children in Norway: Data Analysis of a Quasi-Experimental Study
title_full_unstemmed Relative Performance of Machine Learning and Linear Regression in Predicting Quality of Life and Academic Performance of School Children in Norway: Data Analysis of a Quasi-Experimental Study
title_short Relative Performance of Machine Learning and Linear Regression in Predicting Quality of Life and Academic Performance of School Children in Norway: Data Analysis of a Quasi-Experimental Study
title_sort relative performance of machine learning and linear regression in predicting quality of life and academic performance of school children in norway data analysis of a quasi experimental study
url https://www.jmir.org/2021/7/e22021
work_keys_str_mv AT robertfroud relativeperformanceofmachinelearningandlinearregressioninpredictingqualityoflifeandacademicperformanceofschoolchildreninnorwaydataanalysisofaquasiexperimentalstudy
AT solveighakestadhansen relativeperformanceofmachinelearningandlinearregressioninpredictingqualityoflifeandacademicperformanceofschoolchildreninnorwaydataanalysisofaquasiexperimentalstudy
AT hanskristianruud relativeperformanceofmachinelearningandlinearregressioninpredictingqualityoflifeandacademicperformanceofschoolchildreninnorwaydataanalysisofaquasiexperimentalstudy
AT jonathanfoss relativeperformanceofmachinelearningandlinearregressioninpredictingqualityoflifeandacademicperformanceofschoolchildreninnorwaydataanalysisofaquasiexperimentalstudy
AT leilaferguson relativeperformanceofmachinelearningandlinearregressioninpredictingqualityoflifeandacademicperformanceofschoolchildreninnorwaydataanalysisofaquasiexperimentalstudy
AT permortenfredriksen relativeperformanceofmachinelearningandlinearregressioninpredictingqualityoflifeandacademicperformanceofschoolchildreninnorwaydataanalysisofaquasiexperimentalstudy