Using machine learning to impute legal status of immigrants in the National Health Interview Survey
We describe a novel machine learning method of imputing legal status for immigrants using nationally representative survey data from the Survey of Income and Program Participation (SIPP) and the National Health Interview Survey (NHIS). K-nearest Neighbor (KNN) classifier and Random Forest (RF) Algor...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2022-01-01
|
Series: | MethodsX |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S221501612200228X |
_version_ | 1811187210122166272 |
---|---|
author | Simon A. Ruhnke Fernando A. Wilson Jim P. Stimpson |
author_facet | Simon A. Ruhnke Fernando A. Wilson Jim P. Stimpson |
author_sort | Simon A. Ruhnke |
collection | DOAJ |
description | We describe a novel machine learning method of imputing legal status for immigrants using nationally representative survey data from the Survey of Income and Program Participation (SIPP) and the National Health Interview Survey (NHIS). K-nearest Neighbor (KNN) classifier and Random Forest (RF) Algorithm machine learning were described as novel imputation methods compared to established regression-based imputation. After validating the imputation methods using sensitivity, specificity, positive predictive value (PPV) and accuracy statistics, the Random Forest Algorithm was more accurate in identifying undocumented immigrants and minimized bias in both socio-demographic variables included in the imputation, and unobserved health variables relative to regression-based imputation and KNN. • We developed a new machine learning method of imputing legal status for immigrants that can be used with nationally representative, publicly available data. • Our findings indicate that using machine learning to impute legal status of immigrants, specifically the Random Forest Algorithm, was more accurate in identifying undocumented immigrants and minimized bias relative to other imputation methods. |
first_indexed | 2024-04-11T13:58:19Z |
format | Article |
id | doaj.art-a307be80291343d8a6f0c01fa004f43a |
institution | Directory Open Access Journal |
issn | 2215-0161 |
language | English |
last_indexed | 2024-04-11T13:58:19Z |
publishDate | 2022-01-01 |
publisher | Elsevier |
record_format | Article |
series | MethodsX |
spelling | doaj.art-a307be80291343d8a6f0c01fa004f43a2022-12-22T04:20:11ZengElsevierMethodsX2215-01612022-01-019101848Using machine learning to impute legal status of immigrants in the National Health Interview SurveySimon A. Ruhnke0Fernando A. Wilson1Jim P. Stimpson2Berliner Institut für empirische Integrations- und Migrationsforschung/BIM, Berlin, GermanyUniversity of Utah, Matheson Center for Health Care Studies, Salt Lake City, UTDrexel University, Department of Health Management and Policy, PA, USA; Corresponding author.We describe a novel machine learning method of imputing legal status for immigrants using nationally representative survey data from the Survey of Income and Program Participation (SIPP) and the National Health Interview Survey (NHIS). K-nearest Neighbor (KNN) classifier and Random Forest (RF) Algorithm machine learning were described as novel imputation methods compared to established regression-based imputation. After validating the imputation methods using sensitivity, specificity, positive predictive value (PPV) and accuracy statistics, the Random Forest Algorithm was more accurate in identifying undocumented immigrants and minimized bias in both socio-demographic variables included in the imputation, and unobserved health variables relative to regression-based imputation and KNN. • We developed a new machine learning method of imputing legal status for immigrants that can be used with nationally representative, publicly available data. • Our findings indicate that using machine learning to impute legal status of immigrants, specifically the Random Forest Algorithm, was more accurate in identifying undocumented immigrants and minimized bias relative to other imputation methods.http://www.sciencedirect.com/science/article/pii/S221501612200228XRandom Forest machine learning |
spellingShingle | Simon A. Ruhnke Fernando A. Wilson Jim P. Stimpson Using machine learning to impute legal status of immigrants in the National Health Interview Survey MethodsX Random Forest machine learning |
title | Using machine learning to impute legal status of immigrants in the National Health Interview Survey |
title_full | Using machine learning to impute legal status of immigrants in the National Health Interview Survey |
title_fullStr | Using machine learning to impute legal status of immigrants in the National Health Interview Survey |
title_full_unstemmed | Using machine learning to impute legal status of immigrants in the National Health Interview Survey |
title_short | Using machine learning to impute legal status of immigrants in the National Health Interview Survey |
title_sort | using machine learning to impute legal status of immigrants in the national health interview survey |
topic | Random Forest machine learning |
url | http://www.sciencedirect.com/science/article/pii/S221501612200228X |
work_keys_str_mv | AT simonaruhnke usingmachinelearningtoimputelegalstatusofimmigrantsinthenationalhealthinterviewsurvey AT fernandoawilson usingmachinelearningtoimputelegalstatusofimmigrantsinthenationalhealthinterviewsurvey AT jimpstimpson usingmachinelearningtoimputelegalstatusofimmigrantsinthenationalhealthinterviewsurvey |