A mini-greedy algorithm for faster structural RNA stem-loop search.

When a set of coregulated genes share a common structural RNA motif, e.g. a hairpin, most motif search approaches fail to locate the covarying but structurally conserved motif. There do exist methods that can locate structural RNA motifs, like FOLDALIGN, but the main problem with these methods is th...

Full description

Bibliographic Details
Main Authors:	Gorodkin, J, Lyngso, R, Stormo, G
Format:	Journal article
Language:	English
Published:	2001

_version_	1826259716020371456
author	Gorodkin, J Lyngso, R Stormo, G
author_facet	Gorodkin, J Lyngso, R Stormo, G
author_sort	Gorodkin, J
collection	OXFORD
description	When a set of coregulated genes share a common structural RNA motif, e.g. a hairpin, most motif search approaches fail to locate the covarying but structurally conserved motif. There do exist methods that can locate structural RNA motifs, like FOLDALIGN, but the main problem with these methods is that they are computationally expensive. In FOLDALIGN, a major contribution to this is the use of a greedy algorithm to construct the multiple alignment. To ensure good quality many redundant computations must be made. However, by applying the greedy algorithm on a carefully selected subset of sequences, near full greedy quality can be obtained. The basic idea is to estimate the order in which the sequences entered a good greedy alignment. If such a ranking, found from all pairwise alignments, is in good agreement with the order of appearance in the multiple alignment, the core structural motif can be found by performing the greedy algorithm on just the top sequences in the ranking. The ranking used in this mini-greedy algorithm is found by using two complementing approaches: 1) When interpreting the FOLDALIGN score as an inner product (kernel), the sequences can be ranked according to their distance to their center of mass; 2) We construct an algorithm that attempts to find the K closest sequences in the vector space associated with the inner product, and the remaining sequences can be ranked by their minimum distance to any of the sequences, or to the center of mass in this set. The two approaches arecompared and merged, and the results discussed. We also show that structural alignments of near full greedy quality can found in significantly reduced time, using these methods. The algorithm is being included in the SLASH (Stem-Loop Align SearcH) server available at http://www.bioinf.au.dk/slash.
first_indexed	2024-03-06T18:54:11Z
format	Journal article
id	oxford-uuid:113f980e-8f52-46b8-b2e9-0e7794875510
institution	University of Oxford
language	English
last_indexed	2024-03-06T18:54:11Z
publishDate	2001
record_format	dspace
spelling	oxford-uuid:113f980e-8f52-46b8-b2e9-0e77948755102022-03-26T10:01:18ZA mini-greedy algorithm for faster structural RNA stem-loop search.Journal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:113f980e-8f52-46b8-b2e9-0e7794875510EnglishSymplectic Elements at Oxford2001Gorodkin, JLyngso, RStormo, GWhen a set of coregulated genes share a common structural RNA motif, e.g. a hairpin, most motif search approaches fail to locate the covarying but structurally conserved motif. There do exist methods that can locate structural RNA motifs, like FOLDALIGN, but the main problem with these methods is that they are computationally expensive. In FOLDALIGN, a major contribution to this is the use of a greedy algorithm to construct the multiple alignment. To ensure good quality many redundant computations must be made. However, by applying the greedy algorithm on a carefully selected subset of sequences, near full greedy quality can be obtained. The basic idea is to estimate the order in which the sequences entered a good greedy alignment. If such a ranking, found from all pairwise alignments, is in good agreement with the order of appearance in the multiple alignment, the core structural motif can be found by performing the greedy algorithm on just the top sequences in the ranking. The ranking used in this mini-greedy algorithm is found by using two complementing approaches: 1) When interpreting the FOLDALIGN score as an inner product (kernel), the sequences can be ranked according to their distance to their center of mass; 2) We construct an algorithm that attempts to find the K closest sequences in the vector space associated with the inner product, and the remaining sequences can be ranked by their minimum distance to any of the sequences, or to the center of mass in this set. The two approaches arecompared and merged, and the results discussed. We also show that structural alignments of near full greedy quality can found in significantly reduced time, using these methods. The algorithm is being included in the SLASH (Stem-Loop Align SearcH) server available at http://www.bioinf.au.dk/slash.
spellingShingle	Gorodkin, J Lyngso, R Stormo, G A mini-greedy algorithm for faster structural RNA stem-loop search.
title	A mini-greedy algorithm for faster structural RNA stem-loop search.
title_full	A mini-greedy algorithm for faster structural RNA stem-loop search.
title_fullStr	A mini-greedy algorithm for faster structural RNA stem-loop search.
title_full_unstemmed	A mini-greedy algorithm for faster structural RNA stem-loop search.
title_short	A mini-greedy algorithm for faster structural RNA stem-loop search.
title_sort	mini greedy algorithm for faster structural rna stem loop search
work_keys_str_mv	AT gorodkinj aminigreedyalgorithmforfasterstructuralrnastemloopsearch AT lyngsor aminigreedyalgorithmforfasterstructuralrnastemloopsearch AT stormog aminigreedyalgorithmforfasterstructuralrnastemloopsearch AT gorodkinj minigreedyalgorithmforfasterstructuralrnastemloopsearch AT lyngsor minigreedyalgorithmforfasterstructuralrnastemloopsearch AT stormog minigreedyalgorithmforfasterstructuralrnastemloopsearch

A mini-greedy algorithm for faster structural RNA stem-loop search.

Similar Items