Handling Overlapping Asymmetric Data Sets—A Twice Penalized P-Spline Approach

<b>Aims:</b> Overlapping asymmetric data sets are where a large cohort of observations have a small amount of information recorded, and within this group there exists a smaller cohort which have extensive further information available. Missing imputation is unwise if cohort size differs...

Full description

Bibliographic Details
Main Authors:	Matthew McTeer, Robin Henderson, Quentin M. Anstee, Paolo Missier
Format:	Article
Language:	English
Published:	MDPI AG 2024-03-01
Series:	Mathematics
Subjects:	P-Spline penalized regression smoothing asymmetric data B-Spline non-Parametric
Online Access:	https://www.mdpi.com/2227-7390/12/5/777

_version_	1797264166055051264
author	Matthew McTeer Robin Henderson Quentin M. Anstee Paolo Missier
author_facet	Matthew McTeer Robin Henderson Quentin M. Anstee Paolo Missier
author_sort	Matthew McTeer
collection	DOAJ
description	<b>Aims:</b> Overlapping asymmetric data sets are where a large cohort of observations have a small amount of information recorded, and within this group there exists a smaller cohort which have extensive further information available. Missing imputation is unwise if cohort size differs substantially; therefore, we aim to develop a way of modelling the smaller cohort whilst considering the larger. <b>Methods:</b> Through considering traditionally once penalized P-Spline approximations, we create a second penalty term through observing discrepancies in the marginal value of covariates that exist in both cohorts. Our now twice penalized P-Spline is designed to firstly prevent over/under-fitting of the smaller cohort and secondly to consider the larger cohort. <b>Results:</b> Through a series of data simulations, penalty parameter tunings, and model adaptations, our twice penalized model offers up to a 58% and 46% improvement in model fit upon a continuous and binary response, respectively, against existing B-Spline and once penalized P-Spline methods. Applying our model to an individual’s risk of developing steatohepatitis, we report an over 65% improvement over existing methods. <b>Conclusions:</b> We propose a twice penalized P-Spline method which can vastly improve the model fit of overlapping asymmetric data sets upon a common predictive endpoint, without the need for missing data imputation.
first_indexed	2024-04-25T00:24:34Z
format	Article
id	doaj.art-936066c9cf9a4e22994fca26259b2eff
institution	Directory Open Access Journal
issn	2227-7390
language	English
last_indexed	2024-04-25T00:24:34Z
publishDate	2024-03-01
publisher	MDPI AG
record_format	Article
series	Mathematics
spelling	doaj.art-936066c9cf9a4e22994fca26259b2eff2024-03-12T16:50:19ZengMDPI AGMathematics2227-73902024-03-0112577710.3390/math12050777Handling Overlapping Asymmetric Data Sets—A Twice Penalized P-Spline ApproachMatthew McTeer0Robin Henderson1Quentin M. Anstee2Paolo Missier3School of Computing, Faculty of Science, Agriculture & Engineering, Newcastle University, Newcastle upon Tyne NE1 7RU, UKSchool of Mathematics, Statistics and Physics, Faculty of Science, Agriculture & Engineering, Newcastle University, Newcastle upon Tyne NE1 7RU, UKTranslational & Clinical Research Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne NE1 7RU, UKSchool of Computer Science, University of Birmingham, Birmingham B15 2TT, UK<b>Aims:</b> Overlapping asymmetric data sets are where a large cohort of observations have a small amount of information recorded, and within this group there exists a smaller cohort which have extensive further information available. Missing imputation is unwise if cohort size differs substantially; therefore, we aim to develop a way of modelling the smaller cohort whilst considering the larger. <b>Methods:</b> Through considering traditionally once penalized P-Spline approximations, we create a second penalty term through observing discrepancies in the marginal value of covariates that exist in both cohorts. Our now twice penalized P-Spline is designed to firstly prevent over/under-fitting of the smaller cohort and secondly to consider the larger cohort. <b>Results:</b> Through a series of data simulations, penalty parameter tunings, and model adaptations, our twice penalized model offers up to a 58% and 46% improvement in model fit upon a continuous and binary response, respectively, against existing B-Spline and once penalized P-Spline methods. Applying our model to an individual’s risk of developing steatohepatitis, we report an over 65% improvement over existing methods. <b>Conclusions:</b> We propose a twice penalized P-Spline method which can vastly improve the model fit of overlapping asymmetric data sets upon a common predictive endpoint, without the need for missing data imputation.https://www.mdpi.com/2227-7390/12/5/777P-Splinepenalized regressionsmoothingasymmetric dataB-Splinenon-Parametric
spellingShingle	Matthew McTeer Robin Henderson Quentin M. Anstee Paolo Missier Handling Overlapping Asymmetric Data Sets—A Twice Penalized P-Spline Approach Mathematics P-Spline penalized regression smoothing asymmetric data B-Spline non-Parametric
title	Handling Overlapping Asymmetric Data Sets—A Twice Penalized P-Spline Approach
title_full	Handling Overlapping Asymmetric Data Sets—A Twice Penalized P-Spline Approach
title_fullStr	Handling Overlapping Asymmetric Data Sets—A Twice Penalized P-Spline Approach
title_full_unstemmed	Handling Overlapping Asymmetric Data Sets—A Twice Penalized P-Spline Approach
title_short	Handling Overlapping Asymmetric Data Sets—A Twice Penalized P-Spline Approach
title_sort	handling overlapping asymmetric data sets a twice penalized p spline approach
topic	P-Spline penalized regression smoothing asymmetric data B-Spline non-Parametric
url	https://www.mdpi.com/2227-7390/12/5/777
work_keys_str_mv	AT matthewmcteer handlingoverlappingasymmetricdatasetsatwicepenalizedpsplineapproach AT robinhenderson handlingoverlappingasymmetricdatasetsatwicepenalizedpsplineapproach AT quentinmanstee handlingoverlappingasymmetricdatasetsatwicepenalizedpsplineapproach AT paolomissier handlingoverlappingasymmetricdatasetsatwicepenalizedpsplineapproach

Handling Overlapping Asymmetric Data Sets—A Twice Penalized P-Spline Approach

Similar Items