Influence of Pattern of Missing Data on Performance of Imputation Methods: An Example from National Data on Drug Injection in Prisons

Background Policy makers need models to be able to detect groups at high risk of HIV infection. Incomplete records and dirty data are frequently seen in national data sets. Presence of missing data challenges the practice of model development. Several studies suggested that performance of imputation...

Full description

Bibliographic Details
Main Authors: Mohammad Reza Baneshi, Azam Rastegari, Ali-Akbar Haghdoost, Saiedeh Haji-Maghsoudi
Format: Article
Language:English
Published: Kerman University of Medical Sciences 2013-05-01
Series:International Journal of Health Policy and Management
Subjects:
Online Access:http://ijhpm.com/?_action=showPDF&article=2568&_ob=df7026e4644cd5d67b96954a88a47c2a&fileName=full_text.pdf.
_version_ 1811318007410982912
author Mohammad Reza Baneshi
Azam Rastegari
Ali-Akbar Haghdoost
Saiedeh Haji-Maghsoudi
author_facet Mohammad Reza Baneshi
Azam Rastegari
Ali-Akbar Haghdoost
Saiedeh Haji-Maghsoudi
author_sort Mohammad Reza Baneshi
collection DOAJ
description Background Policy makers need models to be able to detect groups at high risk of HIV infection. Incomplete records and dirty data are frequently seen in national data sets. Presence of missing data challenges the practice of model development. Several studies suggested that performance of imputation methods is acceptable when missing rate is moderate. One of the issues which was of less concern, to be addressed here, is the role of the pattern of missing data. Methods We used information of 2720 prisoners. Results derived from fitting regression model to whole data were served as gold standard. Missing data were then generated so that 10%, 20% and 50% of data were lost. In scenario 1, we generated missing values, at above rates, in one variable which was significant in gold model (age). In scenario 2, a small proportion of each of independent variable was dropped out. Four imputation methods, under different Event Per Variable (EPV) values, were compared in terms of selection of important variables and parameter estimation. Results In scenario 2, bias in estimates was low and performances of all method for handing missing data were similar. All methods at all missing rates were able to detect significance of age. In scenario 1, biases in estimations were increased, in particular at 50% missing rate. Here at EPVs of 10 and 5, imputation methods failed to capture effect of age. Conclusion In scenario 2, all imputation methods at all missing rates, were able to detect age as being significant. This was not the case in scenario 1. Our results showed that performance of imputation methods depends on the pattern of missing data.
first_indexed 2024-04-13T12:18:23Z
format Article
id doaj.art-63dfa87b008c422cbb588096b6a38da0
institution Directory Open Access Journal
issn 2322-5939
language English
last_indexed 2024-04-13T12:18:23Z
publishDate 2013-05-01
publisher Kerman University of Medical Sciences
record_format Article
series International Journal of Health Policy and Management
spelling doaj.art-63dfa87b008c422cbb588096b6a38da02022-12-22T02:47:17ZengKerman University of Medical SciencesInternational Journal of Health Policy and Management2322-59392013-05-01118191Influence of Pattern of Missing Data on Performance of Imputation Methods: An Example from National Data on Drug Injection in PrisonsMohammad Reza BaneshiAzam RastegariAli-Akbar HaghdoostSaiedeh Haji-MaghsoudiBackground Policy makers need models to be able to detect groups at high risk of HIV infection. Incomplete records and dirty data are frequently seen in national data sets. Presence of missing data challenges the practice of model development. Several studies suggested that performance of imputation methods is acceptable when missing rate is moderate. One of the issues which was of less concern, to be addressed here, is the role of the pattern of missing data. Methods We used information of 2720 prisoners. Results derived from fitting regression model to whole data were served as gold standard. Missing data were then generated so that 10%, 20% and 50% of data were lost. In scenario 1, we generated missing values, at above rates, in one variable which was significant in gold model (age). In scenario 2, a small proportion of each of independent variable was dropped out. Four imputation methods, under different Event Per Variable (EPV) values, were compared in terms of selection of important variables and parameter estimation. Results In scenario 2, bias in estimates was low and performances of all method for handing missing data were similar. All methods at all missing rates were able to detect significance of age. In scenario 1, biases in estimations were increased, in particular at 50% missing rate. Here at EPVs of 10 and 5, imputation methods failed to capture effect of age. Conclusion In scenario 2, all imputation methods at all missing rates, were able to detect age as being significant. This was not the case in scenario 1. Our results showed that performance of imputation methods depends on the pattern of missing data.http://ijhpm.com/?_action=showPDF&article=2568&_ob=df7026e4644cd5d67b96954a88a47c2a&fileName=full_text.pdf.Missing DataMiceExpectation Maximum AlgorithmDrug InjectionNational Data
spellingShingle Mohammad Reza Baneshi
Azam Rastegari
Ali-Akbar Haghdoost
Saiedeh Haji-Maghsoudi
Influence of Pattern of Missing Data on Performance of Imputation Methods: An Example from National Data on Drug Injection in Prisons
International Journal of Health Policy and Management
Missing Data
Mice
Expectation Maximum Algorithm
Drug Injection
National Data
title Influence of Pattern of Missing Data on Performance of Imputation Methods: An Example from National Data on Drug Injection in Prisons
title_full Influence of Pattern of Missing Data on Performance of Imputation Methods: An Example from National Data on Drug Injection in Prisons
title_fullStr Influence of Pattern of Missing Data on Performance of Imputation Methods: An Example from National Data on Drug Injection in Prisons
title_full_unstemmed Influence of Pattern of Missing Data on Performance of Imputation Methods: An Example from National Data on Drug Injection in Prisons
title_short Influence of Pattern of Missing Data on Performance of Imputation Methods: An Example from National Data on Drug Injection in Prisons
title_sort influence of pattern of missing data on performance of imputation methods an example from national data on drug injection in prisons
topic Missing Data
Mice
Expectation Maximum Algorithm
Drug Injection
National Data
url http://ijhpm.com/?_action=showPDF&article=2568&_ob=df7026e4644cd5d67b96954a88a47c2a&fileName=full_text.pdf.
work_keys_str_mv AT mohammadrezabaneshi influenceofpatternofmissingdataonperformanceofimputationmethodsanexamplefromnationaldataondruginjectioninprisons
AT azamrastegari influenceofpatternofmissingdataonperformanceofimputationmethodsanexamplefromnationaldataondruginjectioninprisons
AT aliakbarhaghdoost influenceofpatternofmissingdataonperformanceofimputationmethodsanexamplefromnationaldataondruginjectioninprisons
AT saiedehhajimaghsoudi influenceofpatternofmissingdataonperformanceofimputationmethodsanexamplefromnationaldataondruginjectioninprisons