An efficient pseudomedian filter for tiling microrrays

Abstract Background Tiling microarrays are becoming an essential technology in the functional genomics toolbox. They have been applied to the tasks of novel transcript identification, elucidation of transcription factor binding sites, detection of methy...

Full description

Bibliographic Details
Main Authors:	Gerstein Mark B, Carriero Nicholas J, Royce Thomas E
Format:	Article
Language:	English
Published:	BMC 2007-06-01
Series:	BMC Bioinformatics
Online Access:	http://www.biomedcentral.com/1471-2105/8/186

_version_	1828423213771653120
author	Gerstein Mark B Carriero Nicholas J Royce Thomas E
author_facet	Gerstein Mark B Carriero Nicholas J Royce Thomas E
author_sort	Gerstein Mark B
collection	DOAJ
description	<p>Abstract</p> <p>Background</p> <p>Tiling microarrays are becoming an essential technology in the functional genomics toolbox. They have been applied to the tasks of novel transcript identification, elucidation of transcription factor binding sites, detection of methylated DNA and several other applications in several model organisms. These experiments are being conducted at increasingly finer resolutions as the microarray technology enjoys increasingly greater feature densities. The increased densities naturally lead to increased data analysis requirements. Specifically, the most widely employed algorithm for tiling array analysis involves smoothing observed signals by computing pseudomedians within sliding windows, a <it>O</it>(<it>n</it><sup>2</sup>log<it>n</it>) calculation in each window. This poor time complexity is an issue for tiling array analysis and could prove to be a real bottleneck as tiling microarray experiments become grander in scope and finer in resolution.</p> <p>Results</p> <p>We therefore implemented Monahan's HLQEST algorithm that reduces the runtime complexity for computing the pseudomedian of <it>n </it>numbers to <it>O</it>(<it>n</it>log<it>n</it>) from <it>O</it>(<it>n</it><sup>2</sup>log<it>n</it>). For a representative tiling microarray dataset, this modification reduced the smoothing procedure's runtime by nearly 90%. We then leveraged the fact that elements within sliding windows remain largely unchanged in overlapping windows (as one slides across genomic space) to further reduce computation by an additional 43%. This was achieved by the application of skip lists to maintaining a sorted list of values from window to window. This sorted list could be maintained with simple <it>O</it>(log <it>n</it>) inserts and deletes. We illustrate the favorable scaling properties of our algorithms with both time complexity analysis and benchmarking on synthetic datasets.</p> <p>Conclusion</p> <p>Tiling microarray analyses that rely upon a sliding window pseudomedian calculation can require many hours of computation. We have eased this requirement significantly by implementing efficient algorithms that scale well with genomic feature density. This result not only speeds the current standard analyses, but also makes possible ones where many iterations of the filter may be required, such as might be required in a bootstrap or parameter estimation setting. Source code and executables are available at <url>http://tiling.gersteinlab.org/pseudomedian/</url>.</p>
first_indexed	2024-12-10T15:58:57Z
format	Article
id	doaj.art-90efb0ecf22c4529924f4f100237534a
institution	Directory Open Access Journal
issn	1471-2105
language	English
last_indexed	2024-12-10T15:58:57Z
publishDate	2007-06-01
publisher	BMC
record_format	Article
series	BMC Bioinformatics
spelling	doaj.art-90efb0ecf22c4529924f4f100237534a2022-12-22T01:42:31ZengBMCBMC Bioinformatics1471-21052007-06-018118610.1186/1471-2105-8-186An efficient pseudomedian filter for tiling microrraysGerstein Mark BCarriero Nicholas JRoyce Thomas E<p>Abstract</p> <p>Background</p> <p>Tiling microarrays are becoming an essential technology in the functional genomics toolbox. They have been applied to the tasks of novel transcript identification, elucidation of transcription factor binding sites, detection of methylated DNA and several other applications in several model organisms. These experiments are being conducted at increasingly finer resolutions as the microarray technology enjoys increasingly greater feature densities. The increased densities naturally lead to increased data analysis requirements. Specifically, the most widely employed algorithm for tiling array analysis involves smoothing observed signals by computing pseudomedians within sliding windows, a <it>O</it>(<it>n</it><sup>2</sup>log<it>n</it>) calculation in each window. This poor time complexity is an issue for tiling array analysis and could prove to be a real bottleneck as tiling microarray experiments become grander in scope and finer in resolution.</p> <p>Results</p> <p>We therefore implemented Monahan's HLQEST algorithm that reduces the runtime complexity for computing the pseudomedian of <it>n </it>numbers to <it>O</it>(<it>n</it>log<it>n</it>) from <it>O</it>(<it>n</it><sup>2</sup>log<it>n</it>). For a representative tiling microarray dataset, this modification reduced the smoothing procedure's runtime by nearly 90%. We then leveraged the fact that elements within sliding windows remain largely unchanged in overlapping windows (as one slides across genomic space) to further reduce computation by an additional 43%. This was achieved by the application of skip lists to maintaining a sorted list of values from window to window. This sorted list could be maintained with simple <it>O</it>(log <it>n</it>) inserts and deletes. We illustrate the favorable scaling properties of our algorithms with both time complexity analysis and benchmarking on synthetic datasets.</p> <p>Conclusion</p> <p>Tiling microarray analyses that rely upon a sliding window pseudomedian calculation can require many hours of computation. We have eased this requirement significantly by implementing efficient algorithms that scale well with genomic feature density. This result not only speeds the current standard analyses, but also makes possible ones where many iterations of the filter may be required, such as might be required in a bootstrap or parameter estimation setting. Source code and executables are available at <url>http://tiling.gersteinlab.org/pseudomedian/</url>.</p>http://www.biomedcentral.com/1471-2105/8/186
spellingShingle	Gerstein Mark B Carriero Nicholas J Royce Thomas E An efficient pseudomedian filter for tiling microrrays BMC Bioinformatics
title	An efficient pseudomedian filter for tiling microrrays
title_full	An efficient pseudomedian filter for tiling microrrays
title_fullStr	An efficient pseudomedian filter for tiling microrrays
title_full_unstemmed	An efficient pseudomedian filter for tiling microrrays
title_short	An efficient pseudomedian filter for tiling microrrays
title_sort	efficient pseudomedian filter for tiling microrrays
url	http://www.biomedcentral.com/1471-2105/8/186
work_keys_str_mv	AT gersteinmarkb anefficientpseudomedianfilterfortilingmicrorrays AT carrieronicholasj anefficientpseudomedianfilterfortilingmicrorrays AT roycethomase anefficientpseudomedianfilterfortilingmicrorrays AT gersteinmarkb efficientpseudomedianfilterfortilingmicrorrays AT carrieronicholasj efficientpseudomedianfilterfortilingmicrorrays AT roycethomase efficientpseudomedianfilterfortilingmicrorrays

An efficient pseudomedian filter for tiling microrrays

Similar Items