Calibration and XGBoost reweighting to reduce coverage and non-response biases in overlapping panel surveys: application to the Healthcare and Social Survey

Abstract Background Surveys have been used worldwide to provide information on the COVID-19 pandemic impact so as to prepare and deliver an effective Public Health response. Overlapping panel surveys allow longitudinal estimates and more accurate cross-sectional estimates to be obtained thanks to th...

Full description

Bibliographic Details
Main Authors: Luis Castro, María del Mar Rueda, Carmen Sánchez-Cantalejo, Ramón Ferri, Andrés Cabrera-León
Format: Article
Language:English
Published: BMC 2024-02-01
Series:BMC Medical Research Methodology
Subjects:
Online Access:https://doi.org/10.1186/s12874-024-02171-z
_version_ 1797274195828146176
author Luis Castro
María del Mar Rueda
Carmen Sánchez-Cantalejo
Ramón Ferri
Andrés Cabrera-León
author_facet Luis Castro
María del Mar Rueda
Carmen Sánchez-Cantalejo
Ramón Ferri
Andrés Cabrera-León
author_sort Luis Castro
collection DOAJ
description Abstract Background Surveys have been used worldwide to provide information on the COVID-19 pandemic impact so as to prepare and deliver an effective Public Health response. Overlapping panel surveys allow longitudinal estimates and more accurate cross-sectional estimates to be obtained thanks to the larger sample size. However, the problem of non-response is particularly aggravated in the case of panel surveys due to population fatigue with repeated surveys. Objective To develop a new reweighting method for overlapping panel surveys affected by non-response. Methods We chose the Healthcare and Social Survey which has an overlapping panel survey design with measurements throughout 2020 and 2021, and random samplings stratified by province and degree of urbanization. Each measurement comprises two samples: a longitudinal sample taken from previous measurements and a new sample taken at each measurement. Results Our reweighting methodological approach is the result of a two-step process: the original sampling design weights are corrected by modelling non-response with respect to the longitudinal sample obtained in a previous measurement using machine learning techniques, followed by calibration using the auxiliary information available at the population level. It is applied to the estimation of totals, proportions, ratios, and differences between measurements, and to gender gaps in the variable of self-perceived general health. Conclusion The proposed method produces suitable estimators for both cross-sectional and longitudinal samples. For addressing future health crises such as COVID-19, it is therefore necessary to reduce potential coverage and non-response biases in surveys by means of utilizing reweighting techniques as proposed in this study.
first_indexed 2024-03-07T14:55:51Z
format Article
id doaj.art-ce8fbc85af3141e68213ce0a203c2047
institution Directory Open Access Journal
issn 1471-2288
language English
last_indexed 2024-03-07T14:55:51Z
publishDate 2024-02-01
publisher BMC
record_format Article
series BMC Medical Research Methodology
spelling doaj.art-ce8fbc85af3141e68213ce0a203c20472024-03-05T19:28:34ZengBMCBMC Medical Research Methodology1471-22882024-02-0124111910.1186/s12874-024-02171-zCalibration and XGBoost reweighting to reduce coverage and non-response biases in overlapping panel surveys: application to the Healthcare and Social SurveyLuis Castro0María del Mar Rueda1Carmen Sánchez-Cantalejo2Ramón Ferri3Andrés Cabrera-León4Department of Statistics and Operational Research, University of GranadaDepartment of Statistics and Operational Research, University of GranadaDepartment of Public Health, Andalusian School of Public HealthDepartment of Statistics and Operational Research, University of GranadaDepartment of Public Health, Andalusian School of Public HealthAbstract Background Surveys have been used worldwide to provide information on the COVID-19 pandemic impact so as to prepare and deliver an effective Public Health response. Overlapping panel surveys allow longitudinal estimates and more accurate cross-sectional estimates to be obtained thanks to the larger sample size. However, the problem of non-response is particularly aggravated in the case of panel surveys due to population fatigue with repeated surveys. Objective To develop a new reweighting method for overlapping panel surveys affected by non-response. Methods We chose the Healthcare and Social Survey which has an overlapping panel survey design with measurements throughout 2020 and 2021, and random samplings stratified by province and degree of urbanization. Each measurement comprises two samples: a longitudinal sample taken from previous measurements and a new sample taken at each measurement. Results Our reweighting methodological approach is the result of a two-step process: the original sampling design weights are corrected by modelling non-response with respect to the longitudinal sample obtained in a previous measurement using machine learning techniques, followed by calibration using the auxiliary information available at the population level. It is applied to the estimation of totals, proportions, ratios, and differences between measurements, and to gender gaps in the variable of self-perceived general health. Conclusion The proposed method produces suitable estimators for both cross-sectional and longitudinal samples. For addressing future health crises such as COVID-19, it is therefore necessary to reduce potential coverage and non-response biases in surveys by means of utilizing reweighting techniques as proposed in this study.https://doi.org/10.1186/s12874-024-02171-zPublic healthCOVID-19Panel surveysSamplingMachine learningNon-response bias
spellingShingle Luis Castro
María del Mar Rueda
Carmen Sánchez-Cantalejo
Ramón Ferri
Andrés Cabrera-León
Calibration and XGBoost reweighting to reduce coverage and non-response biases in overlapping panel surveys: application to the Healthcare and Social Survey
BMC Medical Research Methodology
Public health
COVID-19
Panel surveys
Sampling
Machine learning
Non-response bias
title Calibration and XGBoost reweighting to reduce coverage and non-response biases in overlapping panel surveys: application to the Healthcare and Social Survey
title_full Calibration and XGBoost reweighting to reduce coverage and non-response biases in overlapping panel surveys: application to the Healthcare and Social Survey
title_fullStr Calibration and XGBoost reweighting to reduce coverage and non-response biases in overlapping panel surveys: application to the Healthcare and Social Survey
title_full_unstemmed Calibration and XGBoost reweighting to reduce coverage and non-response biases in overlapping panel surveys: application to the Healthcare and Social Survey
title_short Calibration and XGBoost reweighting to reduce coverage and non-response biases in overlapping panel surveys: application to the Healthcare and Social Survey
title_sort calibration and xgboost reweighting to reduce coverage and non response biases in overlapping panel surveys application to the healthcare and social survey
topic Public health
COVID-19
Panel surveys
Sampling
Machine learning
Non-response bias
url https://doi.org/10.1186/s12874-024-02171-z
work_keys_str_mv AT luiscastro calibrationandxgboostreweightingtoreducecoverageandnonresponsebiasesinoverlappingpanelsurveysapplicationtothehealthcareandsocialsurvey
AT mariadelmarrueda calibrationandxgboostreweightingtoreducecoverageandnonresponsebiasesinoverlappingpanelsurveysapplicationtothehealthcareandsocialsurvey
AT carmensanchezcantalejo calibrationandxgboostreweightingtoreducecoverageandnonresponsebiasesinoverlappingpanelsurveysapplicationtothehealthcareandsocialsurvey
AT ramonferri calibrationandxgboostreweightingtoreducecoverageandnonresponsebiasesinoverlappingpanelsurveysapplicationtothehealthcareandsocialsurvey
AT andrescabreraleon calibrationandxgboostreweightingtoreducecoverageandnonresponsebiasesinoverlappingpanelsurveysapplicationtothehealthcareandsocialsurvey