On the Use of Gradient Boosting Methods to Improve the Estimation with Data Obtained with Self-Selection Procedures
In the last years, web surveys have established themselves as one of the main methods in empirical research. However, the effect of coverage and selection bias in such surveys has undercut their utility for statistical inference in finite populations. To compensate for these biases, researchers have...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-11-01
|
Series: | Mathematics |
Subjects: | |
Online Access: | https://www.mdpi.com/2227-7390/9/23/2991 |
_version_ | 1797507421176856576 |
---|---|
author | Luis Castro-Martín María del Mar Rueda Ramón Ferri-García César Hernando-Tamayo |
author_facet | Luis Castro-Martín María del Mar Rueda Ramón Ferri-García César Hernando-Tamayo |
author_sort | Luis Castro-Martín |
collection | DOAJ |
description | In the last years, web surveys have established themselves as one of the main methods in empirical research. However, the effect of coverage and selection bias in such surveys has undercut their utility for statistical inference in finite populations. To compensate for these biases, researchers have employed a variety of statistical techniques to adjust nonprobability samples so that they more closely match the population. In this study, we test the potential of the XGBoost algorithm in the most important methods for estimation that integrate data from a probability survey and a nonprobability survey. At the same time, a comparison is made of the effectiveness of these methods for the elimination of biases. The results show that the four proposed estimators based on gradient boosting frameworks can improve survey representativity with respect to other classic prediction methods. The proposed methodology is also used to analyze a real nonprobability survey sample on the social effects of COVID-19. |
first_indexed | 2024-03-10T04:49:15Z |
format | Article |
id | doaj.art-47160649c3f64efd8d20d88a9f777391 |
institution | Directory Open Access Journal |
issn | 2227-7390 |
language | English |
last_indexed | 2024-03-10T04:49:15Z |
publishDate | 2021-11-01 |
publisher | MDPI AG |
record_format | Article |
series | Mathematics |
spelling | doaj.art-47160649c3f64efd8d20d88a9f7773912023-11-23T02:44:22ZengMDPI AGMathematics2227-73902021-11-01923299110.3390/math9232991On the Use of Gradient Boosting Methods to Improve the Estimation with Data Obtained with Self-Selection ProceduresLuis Castro-Martín0María del Mar Rueda1Ramón Ferri-García2César Hernando-Tamayo3Department of Statistics and Operational Research, University of Granada, 18011 Granada, SpainDepartment of Statistics and Operational Research, University of Granada, 18011 Granada, SpainDepartment of Statistics and Operational Research, University of Granada, 18011 Granada, SpainDepartment of Statistics and Operational Research, University of Granada, 18011 Granada, SpainIn the last years, web surveys have established themselves as one of the main methods in empirical research. However, the effect of coverage and selection bias in such surveys has undercut their utility for statistical inference in finite populations. To compensate for these biases, researchers have employed a variety of statistical techniques to adjust nonprobability samples so that they more closely match the population. In this study, we test the potential of the XGBoost algorithm in the most important methods for estimation that integrate data from a probability survey and a nonprobability survey. At the same time, a comparison is made of the effectiveness of these methods for the elimination of biases. The results show that the four proposed estimators based on gradient boosting frameworks can improve survey representativity with respect to other classic prediction methods. The proposed methodology is also used to analyze a real nonprobability survey sample on the social effects of COVID-19.https://www.mdpi.com/2227-7390/9/23/2991nonprobability surveysmachine learning techniquespropensity score adjustmentsurvey sampling |
spellingShingle | Luis Castro-Martín María del Mar Rueda Ramón Ferri-García César Hernando-Tamayo On the Use of Gradient Boosting Methods to Improve the Estimation with Data Obtained with Self-Selection Procedures Mathematics nonprobability surveys machine learning techniques propensity score adjustment survey sampling |
title | On the Use of Gradient Boosting Methods to Improve the Estimation with Data Obtained with Self-Selection Procedures |
title_full | On the Use of Gradient Boosting Methods to Improve the Estimation with Data Obtained with Self-Selection Procedures |
title_fullStr | On the Use of Gradient Boosting Methods to Improve the Estimation with Data Obtained with Self-Selection Procedures |
title_full_unstemmed | On the Use of Gradient Boosting Methods to Improve the Estimation with Data Obtained with Self-Selection Procedures |
title_short | On the Use of Gradient Boosting Methods to Improve the Estimation with Data Obtained with Self-Selection Procedures |
title_sort | on the use of gradient boosting methods to improve the estimation with data obtained with self selection procedures |
topic | nonprobability surveys machine learning techniques propensity score adjustment survey sampling |
url | https://www.mdpi.com/2227-7390/9/23/2991 |
work_keys_str_mv | AT luiscastromartin ontheuseofgradientboostingmethodstoimprovetheestimationwithdataobtainedwithselfselectionprocedures AT mariadelmarrueda ontheuseofgradientboostingmethodstoimprovetheestimationwithdataobtainedwithselfselectionprocedures AT ramonferrigarcia ontheuseofgradientboostingmethodstoimprovetheestimationwithdataobtainedwithselfselectionprocedures AT cesarhernandotamayo ontheuseofgradientboostingmethodstoimprovetheestimationwithdataobtainedwithselfselectionprocedures |