ScanFold 2.0: a rapid approach for identifying potential structured RNA targets in genomes and transcriptomes

A major limiting factor in target discovery for both basic research and therapeutic intervention is the identification of structural and/or functional RNA elements in genomes and transcriptomes. This was the impetus for the original ScanFold algorithm, which provides maps of local RNA structural sta...

Full description

Bibliographic Details
Main Authors: Ryan J. Andrews, Warren B. Rouse, Collin A. O’Leary, Nicholas J. Booher, Walter N. Moss
Format: Article
Language:English
Published: PeerJ Inc. 2022-11-01
Series:PeerJ
Subjects:
Online Access:https://peerj.com/articles/14361.pdf
_version_ 1797421064765046784
author Ryan J. Andrews
Warren B. Rouse
Collin A. O’Leary
Nicholas J. Booher
Walter N. Moss
author_facet Ryan J. Andrews
Warren B. Rouse
Collin A. O’Leary
Nicholas J. Booher
Walter N. Moss
author_sort Ryan J. Andrews
collection DOAJ
description A major limiting factor in target discovery for both basic research and therapeutic intervention is the identification of structural and/or functional RNA elements in genomes and transcriptomes. This was the impetus for the original ScanFold algorithm, which provides maps of local RNA structural stability, evidence of sequence-ordered (potentially evolved) structure, and unique model structures comprised of recurring base pairs with the greatest structural bias. A key step in quantifying this propensity for ordered structure is the prediction of secondary structural stability for randomized sequences which, in the original implementation of ScanFold, is explicitly evaluated. This slow process has limited the rapid identification of ordered structures in large genomes/transcriptomes, which we seek to overcome in this current work introducing ScanFold 2.0. In this revised version of ScanFold, we no longer explicitly evaluate randomized sequence folding energy, but rather estimate it using a machine learning approach. For high randomization numbers, this can increase prediction speeds over 100-fold compared to ScanFold 1.0, allowing for the analysis of large sequences, as well as the use of additional folding algorithms that may be computationally expensive. In the testing of ScanFold 2.0, we re-evaluate the Zika, HIV, and SARS-CoV-2 genomes and compare both the consistency of results and the time of each run to ScanFold 1.0. We also re-evaluate the SARS-CoV-2 genome to assess the quality of ScanFold 2.0 predictions vs several biochemical structure probing datasets and compare the results to those of the original ScanFold program.
first_indexed 2024-03-09T07:11:32Z
format Article
id doaj.art-974e03753a5b44e8ab290d91009c1eca
institution Directory Open Access Journal
issn 2167-8359
language English
last_indexed 2024-03-09T07:11:32Z
publishDate 2022-11-01
publisher PeerJ Inc.
record_format Article
series PeerJ
spelling doaj.art-974e03753a5b44e8ab290d91009c1eca2023-12-03T09:06:46ZengPeerJ Inc.PeerJ2167-83592022-11-0110e1436110.7717/peerj.14361ScanFold 2.0: a rapid approach for identifying potential structured RNA targets in genomes and transcriptomesRyan J. Andrews0Warren B. Rouse1Collin A. O’Leary2Nicholas J. Booher3Walter N. Moss4Department of Biochemistry, University of Utah, Salt Lake City, UT, United StatesThe Roy J Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, Iowa, United StatesThe Roy J Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, Iowa, United StatesInfrastructure and Research IT Services, Iowa State University, Ames, IA, United StatesThe Roy J Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, Iowa, United StatesA major limiting factor in target discovery for both basic research and therapeutic intervention is the identification of structural and/or functional RNA elements in genomes and transcriptomes. This was the impetus for the original ScanFold algorithm, which provides maps of local RNA structural stability, evidence of sequence-ordered (potentially evolved) structure, and unique model structures comprised of recurring base pairs with the greatest structural bias. A key step in quantifying this propensity for ordered structure is the prediction of secondary structural stability for randomized sequences which, in the original implementation of ScanFold, is explicitly evaluated. This slow process has limited the rapid identification of ordered structures in large genomes/transcriptomes, which we seek to overcome in this current work introducing ScanFold 2.0. In this revised version of ScanFold, we no longer explicitly evaluate randomized sequence folding energy, but rather estimate it using a machine learning approach. For high randomization numbers, this can increase prediction speeds over 100-fold compared to ScanFold 1.0, allowing for the analysis of large sequences, as well as the use of additional folding algorithms that may be computationally expensive. In the testing of ScanFold 2.0, we re-evaluate the Zika, HIV, and SARS-CoV-2 genomes and compare both the consistency of results and the time of each run to ScanFold 1.0. We also re-evaluate the SARS-CoV-2 genome to assess the quality of ScanFold 2.0 predictions vs several biochemical structure probing datasets and compare the results to those of the original ScanFold program.https://peerj.com/articles/14361.pdfRNAMotif discoveryRNA structureGenome annotationSequence analysis
spellingShingle Ryan J. Andrews
Warren B. Rouse
Collin A. O’Leary
Nicholas J. Booher
Walter N. Moss
ScanFold 2.0: a rapid approach for identifying potential structured RNA targets in genomes and transcriptomes
PeerJ
RNA
Motif discovery
RNA structure
Genome annotation
Sequence analysis
title ScanFold 2.0: a rapid approach for identifying potential structured RNA targets in genomes and transcriptomes
title_full ScanFold 2.0: a rapid approach for identifying potential structured RNA targets in genomes and transcriptomes
title_fullStr ScanFold 2.0: a rapid approach for identifying potential structured RNA targets in genomes and transcriptomes
title_full_unstemmed ScanFold 2.0: a rapid approach for identifying potential structured RNA targets in genomes and transcriptomes
title_short ScanFold 2.0: a rapid approach for identifying potential structured RNA targets in genomes and transcriptomes
title_sort scanfold 2 0 a rapid approach for identifying potential structured rna targets in genomes and transcriptomes
topic RNA
Motif discovery
RNA structure
Genome annotation
Sequence analysis
url https://peerj.com/articles/14361.pdf
work_keys_str_mv AT ryanjandrews scanfold20arapidapproachforidentifyingpotentialstructuredrnatargetsingenomesandtranscriptomes
AT warrenbrouse scanfold20arapidapproachforidentifyingpotentialstructuredrnatargetsingenomesandtranscriptomes
AT collinaoleary scanfold20arapidapproachforidentifyingpotentialstructuredrnatargetsingenomesandtranscriptomes
AT nicholasjbooher scanfold20arapidapproachforidentifyingpotentialstructuredrnatargetsingenomesandtranscriptomes
AT walternmoss scanfold20arapidapproachforidentifyingpotentialstructuredrnatargetsingenomesandtranscriptomes