Integrative missing value estimation for microarray data

<p>Abstract</p> <p>Background</p> <p>Missing value estimation is an important preprocessing step in microarray analysis. Although several methods have been developed to solve this problem, their performance is unsatisfactory for datasets with high rates of missing data,...

Full description

Bibliographic Details
Main Authors: Zhou Xianghong, Waterman Michael S, Li Haifeng, Hu Jianjun
Format: Article
Language:English
Published: BMC 2006-10-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/7/449
_version_ 1811251015222034432
author Zhou Xianghong
Waterman Michael S
Li Haifeng
Hu Jianjun
author_facet Zhou Xianghong
Waterman Michael S
Li Haifeng
Hu Jianjun
author_sort Zhou Xianghong
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>Missing value estimation is an important preprocessing step in microarray analysis. Although several methods have been developed to solve this problem, their performance is unsatisfactory for datasets with high rates of missing data, high measurement noise, or limited numbers of samples. In fact, more than 80% of the time-series datasets in Stanford Microarray Database contain less than eight samples.</p> <p>Results</p> <p>We present the integrative Missing Value Estimation method (iMISS) by incorporating information from multiple reference microarray datasets to improve missing value estimation. For each gene with missing data, we derive a consistent neighbor-gene list by taking reference data sets into consideration. To determine whether the given reference data sets are sufficiently informative for integration, we use a submatrix imputation approach. Our experiments showed that iMISS can significantly and consistently improve the accuracy of the state-of-the-art Local Least Square (LLS) imputation algorithm by up to 15% improvement in our benchmark tests.</p> <p>Conclusion</p> <p>We demonstrated that the order-statistics-based integrative imputation algorithms can achieve significant improvements over the state-of-the-art missing value estimation approaches such as LLS and is especially good for imputing microarray datasets with a limited number of samples, high rates of missing data, or very noisy measurements. With the rapid accumulation of microarray datasets, the performance of our approach can be further improved by incorporating larger and more appropriate reference datasets.</p>
first_indexed 2024-04-12T16:13:17Z
format Article
id doaj.art-fea8c1962b344c0693740c3a028e44d7
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-04-12T16:13:17Z
publishDate 2006-10-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-fea8c1962b344c0693740c3a028e44d72022-12-22T03:25:50ZengBMCBMC Bioinformatics1471-21052006-10-017144910.1186/1471-2105-7-449Integrative missing value estimation for microarray dataZhou XianghongWaterman Michael SLi HaifengHu Jianjun<p>Abstract</p> <p>Background</p> <p>Missing value estimation is an important preprocessing step in microarray analysis. Although several methods have been developed to solve this problem, their performance is unsatisfactory for datasets with high rates of missing data, high measurement noise, or limited numbers of samples. In fact, more than 80% of the time-series datasets in Stanford Microarray Database contain less than eight samples.</p> <p>Results</p> <p>We present the integrative Missing Value Estimation method (iMISS) by incorporating information from multiple reference microarray datasets to improve missing value estimation. For each gene with missing data, we derive a consistent neighbor-gene list by taking reference data sets into consideration. To determine whether the given reference data sets are sufficiently informative for integration, we use a submatrix imputation approach. Our experiments showed that iMISS can significantly and consistently improve the accuracy of the state-of-the-art Local Least Square (LLS) imputation algorithm by up to 15% improvement in our benchmark tests.</p> <p>Conclusion</p> <p>We demonstrated that the order-statistics-based integrative imputation algorithms can achieve significant improvements over the state-of-the-art missing value estimation approaches such as LLS and is especially good for imputing microarray datasets with a limited number of samples, high rates of missing data, or very noisy measurements. With the rapid accumulation of microarray datasets, the performance of our approach can be further improved by incorporating larger and more appropriate reference datasets.</p>http://www.biomedcentral.com/1471-2105/7/449
spellingShingle Zhou Xianghong
Waterman Michael S
Li Haifeng
Hu Jianjun
Integrative missing value estimation for microarray data
BMC Bioinformatics
title Integrative missing value estimation for microarray data
title_full Integrative missing value estimation for microarray data
title_fullStr Integrative missing value estimation for microarray data
title_full_unstemmed Integrative missing value estimation for microarray data
title_short Integrative missing value estimation for microarray data
title_sort integrative missing value estimation for microarray data
url http://www.biomedcentral.com/1471-2105/7/449
work_keys_str_mv AT zhouxianghong integrativemissingvalueestimationformicroarraydata
AT watermanmichaels integrativemissingvalueestimationformicroarraydata
AT lihaifeng integrativemissingvalueestimationformicroarraydata
AT hujianjun integrativemissingvalueestimationformicroarraydata