Weighted multiple imputation of ethnicity data that are missing not at random in primary care databases

ABSTRACT Background Ethnicity is an important factor to be considered in health research because of its association with inequality in disease prevalence and the utilisation of healthcare. Ethnicity recording has been incorporated in primary care electronic health records, and hence is available...

Full description

Bibliographic Details
Main Authors:	Tra My Pham, Irene Petersen, James Carpenter, Tim Morris
Format:	Article
Language:	English
Published:	Swansea University 2017-04-01
Series:	International Journal of Population Data Science
Online Access:	https://ijpds.org/article/view/54

_version_	1797426393563267072
author	Tra My Pham Irene Petersen James Carpenter Tim Morris
author_facet	Tra My Pham Irene Petersen James Carpenter Tim Morris
author_sort	Tra My Pham
collection	DOAJ
description	ABSTRACT Background Ethnicity is an important factor to be considered in health research because of its association with inequality in disease prevalence and the utilisation of healthcare. Ethnicity recording has been incorporated in primary care electronic health records, and hence is available in large UK primary care databases such as The Health Improvement Network (THIN). However, since primary care data are routinely collected for clinical purposes, a large amount of data that are relevant for research including ethnicity is often missing. A popular approach for missing data is multiple imputation (MI). However, the conventional MI method assuming data are missing at random does not give plausible estimates of the ethnicity distribution in THIN compared to the general UK population. This might be due to the fact that ethnicity data in primary care are likely to be missing not at random. Objectives I propose a new MI method, termed ‘weighted multiple imputation’, to deal with data that are missing not at random in categorical variables. Methods Weighted MI combines MI and probability weights which are calculated using external data sources. Census summary statistics for ethnicity can be used to form weights in weighted MI such that the correct marginal ethnic breakdown is recovered in THIN. I conducted a simulation study to examine weighted MI when ethnicity data are missing not at random. In this simulation study which resembled a THIN dataset, ethnicity was an independent variable in a survival model alongside other covariates. Weighted MI was compared to the conventional MI and other traditional missing data methods including complete case analysis and single imputation. Results While a small bias was still present in ethnicity coefficient estimates under weighted MI, it was less severe compared to MI assuming missing at random. Complete case analysis and single imputation were inadequate to handle data that are missing not at random in ethnicity. Conclusions Although not a total cure, weighted MI represents a pragmatic approach that has potential applications not only in ethnicity but also in other incomplete categorical health indicators in electronic health records.
first_indexed	2024-03-09T08:29:35Z
format	Article
id	doaj.art-3d1b183345924eca932c9a9862b4031f
institution	Directory Open Access Journal
issn	2399-4908
language	English
last_indexed	2024-03-09T08:29:35Z
publishDate	2017-04-01
publisher	Swansea University
record_format	Article
series	International Journal of Population Data Science
spelling	doaj.art-3d1b183345924eca932c9a9862b4031f2023-12-02T20:23:51ZengSwansea UniversityInternational Journal of Population Data Science2399-49082017-04-011110.23889/ijpds.v1i1.5454Weighted multiple imputation of ethnicity data that are missing not at random in primary care databasesTra My Pham0Irene Petersen1James Carpenter2Tim Morris3University College LondonUniversity College londonLondon School of Hygiene & Tropical MedicineUniversity College LondonABSTRACT Background Ethnicity is an important factor to be considered in health research because of its association with inequality in disease prevalence and the utilisation of healthcare. Ethnicity recording has been incorporated in primary care electronic health records, and hence is available in large UK primary care databases such as The Health Improvement Network (THIN). However, since primary care data are routinely collected for clinical purposes, a large amount of data that are relevant for research including ethnicity is often missing. A popular approach for missing data is multiple imputation (MI). However, the conventional MI method assuming data are missing at random does not give plausible estimates of the ethnicity distribution in THIN compared to the general UK population. This might be due to the fact that ethnicity data in primary care are likely to be missing not at random. Objectives I propose a new MI method, termed ‘weighted multiple imputation’, to deal with data that are missing not at random in categorical variables. Methods Weighted MI combines MI and probability weights which are calculated using external data sources. Census summary statistics for ethnicity can be used to form weights in weighted MI such that the correct marginal ethnic breakdown is recovered in THIN. I conducted a simulation study to examine weighted MI when ethnicity data are missing not at random. In this simulation study which resembled a THIN dataset, ethnicity was an independent variable in a survival model alongside other covariates. Weighted MI was compared to the conventional MI and other traditional missing data methods including complete case analysis and single imputation. Results While a small bias was still present in ethnicity coefficient estimates under weighted MI, it was less severe compared to MI assuming missing at random. Complete case analysis and single imputation were inadequate to handle data that are missing not at random in ethnicity. Conclusions Although not a total cure, weighted MI represents a pragmatic approach that has potential applications not only in ethnicity but also in other incomplete categorical health indicators in electronic health records.https://ijpds.org/article/view/54
spellingShingle	Tra My Pham Irene Petersen James Carpenter Tim Morris Weighted multiple imputation of ethnicity data that are missing not at random in primary care databases International Journal of Population Data Science
title	Weighted multiple imputation of ethnicity data that are missing not at random in primary care databases
title_full	Weighted multiple imputation of ethnicity data that are missing not at random in primary care databases
title_fullStr	Weighted multiple imputation of ethnicity data that are missing not at random in primary care databases
title_full_unstemmed	Weighted multiple imputation of ethnicity data that are missing not at random in primary care databases
title_short	Weighted multiple imputation of ethnicity data that are missing not at random in primary care databases
title_sort	weighted multiple imputation of ethnicity data that are missing not at random in primary care databases
url	https://ijpds.org/article/view/54
work_keys_str_mv	AT tramypham weightedmultipleimputationofethnicitydatathataremissingnotatrandominprimarycaredatabases AT irenepetersen weightedmultipleimputationofethnicitydatathataremissingnotatrandominprimarycaredatabases AT jamescarpenter weightedmultipleimputationofethnicitydatathataremissingnotatrandominprimarycaredatabases AT timmorris weightedmultipleimputationofethnicitydatathataremissingnotatrandominprimarycaredatabases

Weighted multiple imputation of ethnicity data that are missing not at random in primary care databases

Similar Items