Using machine learning to impute legal status of immigrants in the National Health Interview Survey

We describe a novel machine learning method of imputing legal status for immigrants using nationally representative survey data from the Survey of Income and Program Participation (SIPP) and the National Health Interview Survey (NHIS). K-nearest Neighbor (KNN) classifier and Random Forest (RF) Algor...

Full description

Bibliographic Details
Main Authors:	Simon A. Ruhnke, Fernando A. Wilson, Jim P. Stimpson
Format:	Article
Language:	English
Published:	Elsevier 2022-01-01
Series:	MethodsX
Subjects:	Random Forest machine learning
Online Access:	http://www.sciencedirect.com/science/article/pii/S221501612200228X

_version_	1811187210122166272
author	Simon A. Ruhnke Fernando A. Wilson Jim P. Stimpson
author_facet	Simon A. Ruhnke Fernando A. Wilson Jim P. Stimpson
author_sort	Simon A. Ruhnke
collection	DOAJ
description	We describe a novel machine learning method of imputing legal status for immigrants using nationally representative survey data from the Survey of Income and Program Participation (SIPP) and the National Health Interview Survey (NHIS). K-nearest Neighbor (KNN) classifier and Random Forest (RF) Algorithm machine learning were described as novel imputation methods compared to established regression-based imputation. After validating the imputation methods using sensitivity, specificity, positive predictive value (PPV) and accuracy statistics, the Random Forest Algorithm was more accurate in identifying undocumented immigrants and minimized bias in both socio-demographic variables included in the imputation, and unobserved health variables relative to regression-based imputation and KNN. • We developed a new machine learning method of imputing legal status for immigrants that can be used with nationally representative, publicly available data. • Our findings indicate that using machine learning to impute legal status of immigrants, specifically the Random Forest Algorithm, was more accurate in identifying undocumented immigrants and minimized bias relative to other imputation methods.
first_indexed	2024-04-11T13:58:19Z
format	Article
id	doaj.art-a307be80291343d8a6f0c01fa004f43a
institution	Directory Open Access Journal
issn	2215-0161
language	English
last_indexed	2024-04-11T13:58:19Z
publishDate	2022-01-01
publisher	Elsevier
record_format	Article
series	MethodsX
spelling	doaj.art-a307be80291343d8a6f0c01fa004f43a2022-12-22T04:20:11ZengElsevierMethodsX2215-01612022-01-019101848Using machine learning to impute legal status of immigrants in the National Health Interview SurveySimon A. Ruhnke0Fernando A. Wilson1Jim P. Stimpson2Berliner Institut für empirische Integrations- und Migrationsforschung/BIM, Berlin, GermanyUniversity of Utah, Matheson Center for Health Care Studies, Salt Lake City, UTDrexel University, Department of Health Management and Policy, PA, USA; Corresponding author.We describe a novel machine learning method of imputing legal status for immigrants using nationally representative survey data from the Survey of Income and Program Participation (SIPP) and the National Health Interview Survey (NHIS). K-nearest Neighbor (KNN) classifier and Random Forest (RF) Algorithm machine learning were described as novel imputation methods compared to established regression-based imputation. After validating the imputation methods using sensitivity, specificity, positive predictive value (PPV) and accuracy statistics, the Random Forest Algorithm was more accurate in identifying undocumented immigrants and minimized bias in both socio-demographic variables included in the imputation, and unobserved health variables relative to regression-based imputation and KNN. • We developed a new machine learning method of imputing legal status for immigrants that can be used with nationally representative, publicly available data. • Our findings indicate that using machine learning to impute legal status of immigrants, specifically the Random Forest Algorithm, was more accurate in identifying undocumented immigrants and minimized bias relative to other imputation methods.http://www.sciencedirect.com/science/article/pii/S221501612200228XRandom Forest machine learning
spellingShingle	Simon A. Ruhnke Fernando A. Wilson Jim P. Stimpson Using machine learning to impute legal status of immigrants in the National Health Interview Survey MethodsX Random Forest machine learning
title	Using machine learning to impute legal status of immigrants in the National Health Interview Survey
title_full	Using machine learning to impute legal status of immigrants in the National Health Interview Survey
title_fullStr	Using machine learning to impute legal status of immigrants in the National Health Interview Survey
title_full_unstemmed	Using machine learning to impute legal status of immigrants in the National Health Interview Survey
title_short	Using machine learning to impute legal status of immigrants in the National Health Interview Survey
title_sort	using machine learning to impute legal status of immigrants in the national health interview survey
topic	Random Forest machine learning
url	http://www.sciencedirect.com/science/article/pii/S221501612200228X
work_keys_str_mv	AT simonaruhnke usingmachinelearningtoimputelegalstatusofimmigrantsinthenationalhealthinterviewsurvey AT fernandoawilson usingmachinelearningtoimputelegalstatusofimmigrantsinthenationalhealthinterviewsurvey AT jimpstimpson usingmachinelearningtoimputelegalstatusofimmigrantsinthenationalhealthinterviewsurvey

Using machine learning to impute legal status of immigrants in the National Health Interview Survey

Similar Items