Unsupervised reduction of random noise in complex data by a row-specific, sorted principal component-guided method

<p>Abstract</p> <p>Background</p> <p>Large biological data sets, such as expression profiles, benefit from reduction of random noise. Principal component (PC) analysis has been used for this purpose, but it tends to remove small features as well as random noise.</p&g...

Full description

Bibliographic Details
Main Authors: Katagiri Fumiaki, Foley Joseph W
Format: Article
Language:English
Published: BMC 2008-11-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/9/508
_version_ 1818494446786838528
author Katagiri Fumiaki
Foley Joseph W
author_facet Katagiri Fumiaki
Foley Joseph W
author_sort Katagiri Fumiaki
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>Large biological data sets, such as expression profiles, benefit from reduction of random noise. Principal component (PC) analysis has been used for this purpose, but it tends to remove small features as well as random noise.</p> <p>Results</p> <p>We interpreted the PCs as a mere signal-rich coordinate system and sorted the squared PC-coordinates of each row in descending order. The sorted squared PC-coordinates were compared with the distribution of the ordered squared random noise, and PC-coordinates for insignificant contributions were treated as random noise and nullified. The processed data were transformed back to the initial coordinates as noise-reduced data. To increase the sensitivity of signal capture and reduce the effects of stochastic noise, this procedure was applied to multiple small subsets of rows randomly sampled from a large data set, and the results corresponding to each row of the data set from multiple subsets were averaged. We call this procedure Row-specific, Sorted PRincipal component-guided Noise Reduction (RSPR-NR). Robust performance of RSPR-NR, measured by noise reduction and retention of small features, was demonstrated using simulated data sets. Furthermore, when applied to an actual expression profile data set, RSPR-NR preferentially increased the correlations between genes that share the same Gene Ontology terms, strongly suggesting reduction of random noise in the data set.</p> <p>Conclusion</p> <p>RSPR-NR is a robust random noise reduction method that retains small features well. It should be useful in improving the quality of large biological data sets.</p>
first_indexed 2024-12-10T18:06:28Z
format Article
id doaj.art-b08efc8f85534c1fbc5081d280ca4f00
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-10T18:06:28Z
publishDate 2008-11-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-b08efc8f85534c1fbc5081d280ca4f002022-12-22T01:38:37ZengBMCBMC Bioinformatics1471-21052008-11-019150810.1186/1471-2105-9-508Unsupervised reduction of random noise in complex data by a row-specific, sorted principal component-guided methodKatagiri FumiakiFoley Joseph W<p>Abstract</p> <p>Background</p> <p>Large biological data sets, such as expression profiles, benefit from reduction of random noise. Principal component (PC) analysis has been used for this purpose, but it tends to remove small features as well as random noise.</p> <p>Results</p> <p>We interpreted the PCs as a mere signal-rich coordinate system and sorted the squared PC-coordinates of each row in descending order. The sorted squared PC-coordinates were compared with the distribution of the ordered squared random noise, and PC-coordinates for insignificant contributions were treated as random noise and nullified. The processed data were transformed back to the initial coordinates as noise-reduced data. To increase the sensitivity of signal capture and reduce the effects of stochastic noise, this procedure was applied to multiple small subsets of rows randomly sampled from a large data set, and the results corresponding to each row of the data set from multiple subsets were averaged. We call this procedure Row-specific, Sorted PRincipal component-guided Noise Reduction (RSPR-NR). Robust performance of RSPR-NR, measured by noise reduction and retention of small features, was demonstrated using simulated data sets. Furthermore, when applied to an actual expression profile data set, RSPR-NR preferentially increased the correlations between genes that share the same Gene Ontology terms, strongly suggesting reduction of random noise in the data set.</p> <p>Conclusion</p> <p>RSPR-NR is a robust random noise reduction method that retains small features well. It should be useful in improving the quality of large biological data sets.</p>http://www.biomedcentral.com/1471-2105/9/508
spellingShingle Katagiri Fumiaki
Foley Joseph W
Unsupervised reduction of random noise in complex data by a row-specific, sorted principal component-guided method
BMC Bioinformatics
title Unsupervised reduction of random noise in complex data by a row-specific, sorted principal component-guided method
title_full Unsupervised reduction of random noise in complex data by a row-specific, sorted principal component-guided method
title_fullStr Unsupervised reduction of random noise in complex data by a row-specific, sorted principal component-guided method
title_full_unstemmed Unsupervised reduction of random noise in complex data by a row-specific, sorted principal component-guided method
title_short Unsupervised reduction of random noise in complex data by a row-specific, sorted principal component-guided method
title_sort unsupervised reduction of random noise in complex data by a row specific sorted principal component guided method
url http://www.biomedcentral.com/1471-2105/9/508
work_keys_str_mv AT katagirifumiaki unsupervisedreductionofrandomnoiseincomplexdatabyarowspecificsortedprincipalcomponentguidedmethod
AT foleyjosephw unsupervisedreductionofrandomnoiseincomplexdatabyarowspecificsortedprincipalcomponentguidedmethod