Shape based indexing for faster search of RNA family databases

Abstract Background Most non-coding RNA families exert their function by means of a conserved, common secondary structure. The Rfam data base contains more than five hundred structurally annotated RNA families. Unfortunately, searching for new family me...

Full description

Bibliographic Details
Main Authors:	Reeder Jens, Janssen Stefan, Giegerich Robert
Format:	Article
Language:	English
Published:	BMC 2008-02-01
Series:	BMC Bioinformatics
Online Access:	http://www.biomedcentral.com/1471-2105/9/131

_version_	1811329397832024064
author	Reeder Jens Janssen Stefan Giegerich Robert
author_facet	Reeder Jens Janssen Stefan Giegerich Robert
author_sort	Reeder Jens
collection	DOAJ
description	<p>Abstract</p> <p>Background</p> <p>Most non-coding RNA families exert their function by means of a conserved, common secondary structure. The Rfam data base contains more than five hundred structurally annotated RNA families. Unfortunately, searching for new family members using covariance models (CMs) is very time consuming. Filtering approaches that use the sequence conservation to reduce the number of CM searches, are fast, but it is unknown to which sacrifice.</p> <p>Results</p> <p>We present a new filtering approach, which exploits the family specific secondary structure and significantly reduces the number of CM searches. The filter eliminates approximately 85% of the queries and discards only 2.6% true positives when evaluating Rfam against itself. First results also capture previously undetected non-coding RNAs in a recent human <it>RNAz </it>screen.</p> <p>Conclusion</p> <p>The <b>RNA s</b>hape <b>i</b>ndex <b>f</b>ilter (<it>RNAsifter</it>) is based on the following rationale: An RNA family is characterised by structure, much more succinctly than by sequence content. Structures of individual family members, which naturally have different length and sequence composition, may exhibit structural variation in detail, but overall, they have a common shape in a more abstract sense. Given a fixed release of the Rfam data base, we can compute these abstract shapes for all families. This is called a shape index. If a query sequence belongs to a certain family, it must be able to fold into the family shape with reasonable free energy. Therefore, rather than matching the query against all families in the data base, we can first (and quickly) compute its feasible shape(s), and use the shape index to access only those families where a good match is possible due to a common shape with the query.</p>
first_indexed	2024-04-13T15:43:11Z
format	Article
id	doaj.art-8b52df046775401e9a86976f65ff20ab
institution	Directory Open Access Journal
issn	1471-2105
language	English
last_indexed	2024-04-13T15:43:11Z
publishDate	2008-02-01
publisher	BMC
record_format	Article
series	BMC Bioinformatics
spelling	doaj.art-8b52df046775401e9a86976f65ff20ab2022-12-22T02:41:05ZengBMCBMC Bioinformatics1471-21052008-02-019113110.1186/1471-2105-9-131Shape based indexing for faster search of RNA family databasesReeder JensJanssen StefanGiegerich Robert<p>Abstract</p> <p>Background</p> <p>Most non-coding RNA families exert their function by means of a conserved, common secondary structure. The Rfam data base contains more than five hundred structurally annotated RNA families. Unfortunately, searching for new family members using covariance models (CMs) is very time consuming. Filtering approaches that use the sequence conservation to reduce the number of CM searches, are fast, but it is unknown to which sacrifice.</p> <p>Results</p> <p>We present a new filtering approach, which exploits the family specific secondary structure and significantly reduces the number of CM searches. The filter eliminates approximately 85% of the queries and discards only 2.6% true positives when evaluating Rfam against itself. First results also capture previously undetected non-coding RNAs in a recent human <it>RNAz </it>screen.</p> <p>Conclusion</p> <p>The <b>RNA s</b>hape <b>i</b>ndex <b>f</b>ilter (<it>RNAsifter</it>) is based on the following rationale: An RNA family is characterised by structure, much more succinctly than by sequence content. Structures of individual family members, which naturally have different length and sequence composition, may exhibit structural variation in detail, but overall, they have a common shape in a more abstract sense. Given a fixed release of the Rfam data base, we can compute these abstract shapes for all families. This is called a shape index. If a query sequence belongs to a certain family, it must be able to fold into the family shape with reasonable free energy. Therefore, rather than matching the query against all families in the data base, we can first (and quickly) compute its feasible shape(s), and use the shape index to access only those families where a good match is possible due to a common shape with the query.</p>http://www.biomedcentral.com/1471-2105/9/131
spellingShingle	Reeder Jens Janssen Stefan Giegerich Robert Shape based indexing for faster search of RNA family databases BMC Bioinformatics
title	Shape based indexing for faster search of RNA family databases
title_full	Shape based indexing for faster search of RNA family databases
title_fullStr	Shape based indexing for faster search of RNA family databases
title_full_unstemmed	Shape based indexing for faster search of RNA family databases
title_short	Shape based indexing for faster search of RNA family databases
title_sort	shape based indexing for faster search of rna family databases
url	http://www.biomedcentral.com/1471-2105/9/131
work_keys_str_mv	AT reederjens shapebasedindexingforfastersearchofrnafamilydatabases AT janssenstefan shapebasedindexingforfastersearchofrnafamilydatabases AT giegerichrobert shapebasedindexingforfastersearchofrnafamilydatabases

Shape based indexing for faster search of RNA family databases

Similar Items