Recovery of information from multiple imputation: a simulation study

<p>Abstract</p> <p>Background</p> <p>Multiple imputation is becoming increasingly popular for handling missing data. However, it is often implemented without adequate consideration of whether it offers any advantage over complete case analysis for the research question...

Full description

Bibliographic Details
Main Authors: Lee Katherine J, Carlin John B
Format: Article
Language:English
Published: BMC 2012-06-01
Series:Emerging Themes in Epidemiology
Subjects:
Online Access:http://www.ete-online.com/content/9/1/3
_version_ 1818353729186824192
author Lee Katherine J
Carlin John B
author_facet Lee Katherine J
Carlin John B
author_sort Lee Katherine J
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>Multiple imputation is becoming increasingly popular for handling missing data. However, it is often implemented without adequate consideration of whether it offers any advantage over complete case analysis for the research question of interest, or whether potential gains may be offset by bias from a poorly fitting imputation model, particularly as the amount of missing data increases.</p> <p>Methods</p> <p>Simulated datasets (n = 1000) drawn from a synthetic population were used to explore information recovery from multiple imputation in estimating the coefficient of a binary exposure variable when various proportions of data (10-90%) were set missing at random in a highly-skewed continuous covariate or in the binary exposure. Imputation was performed using multivariate normal imputation (MVNI), with a simple or zero-skewness log transformation to manage non-normality. Bias, precision, mean-squared error and coverage for a set of regression parameter estimates were compared between multiple imputation and complete case analyses.</p> <p>Results</p> <p>For missingness in the continuous covariate, multiple imputation produced less bias and greater precision for the effect of the binary exposure variable, compared with complete case analysis, with larger gains in precision with more missing data. However, even with only moderate missingness, large bias and substantial under-coverage were apparent in estimating the continuous covariate’s effect when skewness was not adequately addressed. For missingness in the binary covariate, all estimates had negligible bias but gains in precision from multiple imputation were minimal, particularly for the coefficient of the binary exposure.</p> <p>Conclusions</p> <p>Although multiple imputation can be useful if covariates required for confounding adjustment are missing, benefits are likely to be minimal when data are missing in the exposure variable of interest. Furthermore, when there are large amounts of missingness, multiple imputation can become unreliable and introduce bias not present in a complete case analysis if the imputation model is not appropriate. Epidemiologists dealing with missing data should keep in mind the potential limitations as well as the potential benefits of multiple imputation. Further work is needed to provide clearer guidelines on effective application of this method.</p>
first_indexed 2024-12-13T19:14:09Z
format Article
id doaj.art-d4c6a8df3ea34f6ba3568db3a36710f7
institution Directory Open Access Journal
issn 1742-7622
language English
last_indexed 2024-12-13T19:14:09Z
publishDate 2012-06-01
publisher BMC
record_format Article
series Emerging Themes in Epidemiology
spelling doaj.art-d4c6a8df3ea34f6ba3568db3a36710f72022-12-21T23:34:21ZengBMCEmerging Themes in Epidemiology1742-76222012-06-0191310.1186/1742-7622-9-3Recovery of information from multiple imputation: a simulation studyLee Katherine JCarlin John B<p>Abstract</p> <p>Background</p> <p>Multiple imputation is becoming increasingly popular for handling missing data. However, it is often implemented without adequate consideration of whether it offers any advantage over complete case analysis for the research question of interest, or whether potential gains may be offset by bias from a poorly fitting imputation model, particularly as the amount of missing data increases.</p> <p>Methods</p> <p>Simulated datasets (n = 1000) drawn from a synthetic population were used to explore information recovery from multiple imputation in estimating the coefficient of a binary exposure variable when various proportions of data (10-90%) were set missing at random in a highly-skewed continuous covariate or in the binary exposure. Imputation was performed using multivariate normal imputation (MVNI), with a simple or zero-skewness log transformation to manage non-normality. Bias, precision, mean-squared error and coverage for a set of regression parameter estimates were compared between multiple imputation and complete case analyses.</p> <p>Results</p> <p>For missingness in the continuous covariate, multiple imputation produced less bias and greater precision for the effect of the binary exposure variable, compared with complete case analysis, with larger gains in precision with more missing data. However, even with only moderate missingness, large bias and substantial under-coverage were apparent in estimating the continuous covariate’s effect when skewness was not adequately addressed. For missingness in the binary covariate, all estimates had negligible bias but gains in precision from multiple imputation were minimal, particularly for the coefficient of the binary exposure.</p> <p>Conclusions</p> <p>Although multiple imputation can be useful if covariates required for confounding adjustment are missing, benefits are likely to be minimal when data are missing in the exposure variable of interest. Furthermore, when there are large amounts of missingness, multiple imputation can become unreliable and introduce bias not present in a complete case analysis if the imputation model is not appropriate. Epidemiologists dealing with missing data should keep in mind the potential limitations as well as the potential benefits of multiple imputation. Further work is needed to provide clearer guidelines on effective application of this method.</p>http://www.ete-online.com/content/9/1/3Missing dataMultiple imputationFully conditional specificationMultivariate normal imputationNon-normal data
spellingShingle Lee Katherine J
Carlin John B
Recovery of information from multiple imputation: a simulation study
Emerging Themes in Epidemiology
Missing data
Multiple imputation
Fully conditional specification
Multivariate normal imputation
Non-normal data
title Recovery of information from multiple imputation: a simulation study
title_full Recovery of information from multiple imputation: a simulation study
title_fullStr Recovery of information from multiple imputation: a simulation study
title_full_unstemmed Recovery of information from multiple imputation: a simulation study
title_short Recovery of information from multiple imputation: a simulation study
title_sort recovery of information from multiple imputation a simulation study
topic Missing data
Multiple imputation
Fully conditional specification
Multivariate normal imputation
Non-normal data
url http://www.ete-online.com/content/9/1/3
work_keys_str_mv AT leekatherinej recoveryofinformationfrommultipleimputationasimulationstudy
AT carlinjohnb recoveryofinformationfrommultipleimputationasimulationstudy