Using machine learning to impute legal status of immigrants in the National Health Interview Survey

We describe a novel machine learning method of imputing legal status for immigrants using nationally representative survey data from the Survey of Income and Program Participation (SIPP) and the National Health Interview Survey (NHIS). K-nearest Neighbor (KNN) classifier and Random Forest (RF) Algor...

Full description

Bibliographic Details
Main Authors: Simon A. Ruhnke, Fernando A. Wilson, Jim P. Stimpson
Format: Article
Language:English
Published: Elsevier 2022-01-01
Series:MethodsX
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S221501612200228X
_version_ 1811187210122166272
author Simon A. Ruhnke
Fernando A. Wilson
Jim P. Stimpson
author_facet Simon A. Ruhnke
Fernando A. Wilson
Jim P. Stimpson
author_sort Simon A. Ruhnke
collection DOAJ
description We describe a novel machine learning method of imputing legal status for immigrants using nationally representative survey data from the Survey of Income and Program Participation (SIPP) and the National Health Interview Survey (NHIS). K-nearest Neighbor (KNN) classifier and Random Forest (RF) Algorithm machine learning were described as novel imputation methods compared to established regression-based imputation. After validating the imputation methods using sensitivity, specificity, positive predictive value (PPV) and accuracy statistics, the Random Forest Algorithm was more accurate in identifying undocumented immigrants and minimized bias in both socio-demographic variables included in the imputation, and unobserved health variables relative to regression-based imputation and KNN. • We developed a new machine learning method of imputing legal status for immigrants that can be used with nationally representative, publicly available data. • Our findings indicate that using machine learning to impute legal status of immigrants, specifically the Random Forest Algorithm, was more accurate in identifying undocumented immigrants and minimized bias relative to other imputation methods.
first_indexed 2024-04-11T13:58:19Z
format Article
id doaj.art-a307be80291343d8a6f0c01fa004f43a
institution Directory Open Access Journal
issn 2215-0161
language English
last_indexed 2024-04-11T13:58:19Z
publishDate 2022-01-01
publisher Elsevier
record_format Article
series MethodsX
spelling doaj.art-a307be80291343d8a6f0c01fa004f43a2022-12-22T04:20:11ZengElsevierMethodsX2215-01612022-01-019101848Using machine learning to impute legal status of immigrants in the National Health Interview SurveySimon A. Ruhnke0Fernando A. Wilson1Jim P. Stimpson2Berliner Institut für empirische Integrations- und Migrationsforschung/BIM, Berlin, GermanyUniversity of Utah, Matheson Center for Health Care Studies, Salt Lake City, UTDrexel University, Department of Health Management and Policy, PA, USA; Corresponding author.We describe a novel machine learning method of imputing legal status for immigrants using nationally representative survey data from the Survey of Income and Program Participation (SIPP) and the National Health Interview Survey (NHIS). K-nearest Neighbor (KNN) classifier and Random Forest (RF) Algorithm machine learning were described as novel imputation methods compared to established regression-based imputation. After validating the imputation methods using sensitivity, specificity, positive predictive value (PPV) and accuracy statistics, the Random Forest Algorithm was more accurate in identifying undocumented immigrants and minimized bias in both socio-demographic variables included in the imputation, and unobserved health variables relative to regression-based imputation and KNN. • We developed a new machine learning method of imputing legal status for immigrants that can be used with nationally representative, publicly available data. • Our findings indicate that using machine learning to impute legal status of immigrants, specifically the Random Forest Algorithm, was more accurate in identifying undocumented immigrants and minimized bias relative to other imputation methods.http://www.sciencedirect.com/science/article/pii/S221501612200228XRandom Forest machine learning
spellingShingle Simon A. Ruhnke
Fernando A. Wilson
Jim P. Stimpson
Using machine learning to impute legal status of immigrants in the National Health Interview Survey
MethodsX
Random Forest machine learning
title Using machine learning to impute legal status of immigrants in the National Health Interview Survey
title_full Using machine learning to impute legal status of immigrants in the National Health Interview Survey
title_fullStr Using machine learning to impute legal status of immigrants in the National Health Interview Survey
title_full_unstemmed Using machine learning to impute legal status of immigrants in the National Health Interview Survey
title_short Using machine learning to impute legal status of immigrants in the National Health Interview Survey
title_sort using machine learning to impute legal status of immigrants in the national health interview survey
topic Random Forest machine learning
url http://www.sciencedirect.com/science/article/pii/S221501612200228X
work_keys_str_mv AT simonaruhnke usingmachinelearningtoimputelegalstatusofimmigrantsinthenationalhealthinterviewsurvey
AT fernandoawilson usingmachinelearningtoimputelegalstatusofimmigrantsinthenationalhealthinterviewsurvey
AT jimpstimpson usingmachinelearningtoimputelegalstatusofimmigrantsinthenationalhealthinterviewsurvey