Weighted multiple imputation of ethnicity data that are missing not at random in primary care databases

ABSTRACT Background Ethnicity is an important factor to be considered in health research because of its association with inequality in disease prevalence and the utilisation of healthcare. Ethnicity recording has been incorporated in primary care electronic health records, and hence is available...

Full description

Bibliographic Details
Main Authors: Tra My Pham, Irene Petersen, James Carpenter, Tim Morris
Format: Article
Language:English
Published: Swansea University 2017-04-01
Series:International Journal of Population Data Science
Online Access:https://ijpds.org/article/view/54
_version_ 1797426393563267072
author Tra My Pham
Irene Petersen
James Carpenter
Tim Morris
author_facet Tra My Pham
Irene Petersen
James Carpenter
Tim Morris
author_sort Tra My Pham
collection DOAJ
description ABSTRACT Background Ethnicity is an important factor to be considered in health research because of its association with inequality in disease prevalence and the utilisation of healthcare. Ethnicity recording has been incorporated in primary care electronic health records, and hence is available in large UK primary care databases such as The Health Improvement Network (THIN). However, since primary care data are routinely collected for clinical purposes, a large amount of data that are relevant for research including ethnicity is often missing. A popular approach for missing data is multiple imputation (MI). However, the conventional MI method assuming data are missing at random does not give plausible estimates of the ethnicity distribution in THIN compared to the general UK population. This might be due to the fact that ethnicity data in primary care are likely to be missing not at random. Objectives I propose a new MI method, termed ‘weighted multiple imputation’, to deal with data that are missing not at random in categorical variables. Methods Weighted MI combines MI and probability weights which are calculated using external data sources. Census summary statistics for ethnicity can be used to form weights in weighted MI such that the correct marginal ethnic breakdown is recovered in THIN. I conducted a simulation study to examine weighted MI when ethnicity data are missing not at random. In this simulation study which resembled a THIN dataset, ethnicity was an independent variable in a survival model alongside other covariates. Weighted MI was compared to the conventional MI and other traditional missing data methods including complete case analysis and single imputation. Results While a small bias was still present in ethnicity coefficient estimates under weighted MI, it was less severe compared to MI assuming missing at random. Complete case analysis and single imputation were inadequate to handle data that are missing not at random in ethnicity. Conclusions Although not a total cure, weighted MI represents a pragmatic approach that has potential applications not only in ethnicity but also in other incomplete categorical health indicators in electronic health records.
first_indexed 2024-03-09T08:29:35Z
format Article
id doaj.art-3d1b183345924eca932c9a9862b4031f
institution Directory Open Access Journal
issn 2399-4908
language English
last_indexed 2024-03-09T08:29:35Z
publishDate 2017-04-01
publisher Swansea University
record_format Article
series International Journal of Population Data Science
spelling doaj.art-3d1b183345924eca932c9a9862b4031f2023-12-02T20:23:51ZengSwansea UniversityInternational Journal of Population Data Science2399-49082017-04-011110.23889/ijpds.v1i1.5454Weighted multiple imputation of ethnicity data that are missing not at random in primary care databasesTra My Pham0Irene Petersen1James Carpenter2Tim Morris3University College LondonUniversity College londonLondon School of Hygiene & Tropical MedicineUniversity College LondonABSTRACT Background Ethnicity is an important factor to be considered in health research because of its association with inequality in disease prevalence and the utilisation of healthcare. Ethnicity recording has been incorporated in primary care electronic health records, and hence is available in large UK primary care databases such as The Health Improvement Network (THIN). However, since primary care data are routinely collected for clinical purposes, a large amount of data that are relevant for research including ethnicity is often missing. A popular approach for missing data is multiple imputation (MI). However, the conventional MI method assuming data are missing at random does not give plausible estimates of the ethnicity distribution in THIN compared to the general UK population. This might be due to the fact that ethnicity data in primary care are likely to be missing not at random. Objectives I propose a new MI method, termed ‘weighted multiple imputation’, to deal with data that are missing not at random in categorical variables. Methods Weighted MI combines MI and probability weights which are calculated using external data sources. Census summary statistics for ethnicity can be used to form weights in weighted MI such that the correct marginal ethnic breakdown is recovered in THIN. I conducted a simulation study to examine weighted MI when ethnicity data are missing not at random. In this simulation study which resembled a THIN dataset, ethnicity was an independent variable in a survival model alongside other covariates. Weighted MI was compared to the conventional MI and other traditional missing data methods including complete case analysis and single imputation. Results While a small bias was still present in ethnicity coefficient estimates under weighted MI, it was less severe compared to MI assuming missing at random. Complete case analysis and single imputation were inadequate to handle data that are missing not at random in ethnicity. Conclusions Although not a total cure, weighted MI represents a pragmatic approach that has potential applications not only in ethnicity but also in other incomplete categorical health indicators in electronic health records.https://ijpds.org/article/view/54
spellingShingle Tra My Pham
Irene Petersen
James Carpenter
Tim Morris
Weighted multiple imputation of ethnicity data that are missing not at random in primary care databases
International Journal of Population Data Science
title Weighted multiple imputation of ethnicity data that are missing not at random in primary care databases
title_full Weighted multiple imputation of ethnicity data that are missing not at random in primary care databases
title_fullStr Weighted multiple imputation of ethnicity data that are missing not at random in primary care databases
title_full_unstemmed Weighted multiple imputation of ethnicity data that are missing not at random in primary care databases
title_short Weighted multiple imputation of ethnicity data that are missing not at random in primary care databases
title_sort weighted multiple imputation of ethnicity data that are missing not at random in primary care databases
url https://ijpds.org/article/view/54
work_keys_str_mv AT tramypham weightedmultipleimputationofethnicitydatathataremissingnotatrandominprimarycaredatabases
AT irenepetersen weightedmultipleimputationofethnicitydatathataremissingnotatrandominprimarycaredatabases
AT jamescarpenter weightedmultipleimputationofethnicitydatathataremissingnotatrandominprimarycaredatabases
AT timmorris weightedmultipleimputationofethnicitydatathataremissingnotatrandominprimarycaredatabases