Knotify: An Efficient Parallel Platform for RNA Pseudoknot Prediction Using Syntactic Pattern Recognition

Obtaining valuable clues for noncoding RNA (ribonucleic acid) subsequences remains a significant challenge, acknowledging that most of the human genome transcribes into noncoding RNA parts related to unknown biological operations. Capturing these clues relies on accurate “base pairing” prediction, a...

Full description

Bibliographic Details
Main Authors: Christos Andrikos, Evangelos Makris, Angelos Kolaitis, Georgios Rassias, Christos Pavlatos, Panayiotis Tsanakas
Format: Article
Language:English
Published: MDPI AG 2022-02-01
Series:Methods and Protocols
Subjects:
Online Access:https://www.mdpi.com/2409-9279/5/1/14
_version_ 1797477593778225152
author Christos Andrikos
Evangelos Makris
Angelos Kolaitis
Georgios Rassias
Christos Pavlatos
Panayiotis Tsanakas
author_facet Christos Andrikos
Evangelos Makris
Angelos Kolaitis
Georgios Rassias
Christos Pavlatos
Panayiotis Tsanakas
author_sort Christos Andrikos
collection DOAJ
description Obtaining valuable clues for noncoding RNA (ribonucleic acid) subsequences remains a significant challenge, acknowledging that most of the human genome transcribes into noncoding RNA parts related to unknown biological operations. Capturing these clues relies on accurate “base pairing” prediction, also known as “RNA secondary structure prediction”. As COVID-19 is considered a severe global threat, the single-stranded SARS-CoV-2 virus reveals the importance of establishing an efficient RNA analysis toolkit. This work aimed to contribute to that by introducing a novel system committed to predicting RNA secondary structure patterns (i.e., RNA’s pseudoknots) that leverage syntactic pattern-recognition strategies. Having focused on the pseudoknot predictions, we formalized the secondary structure prediction of the RNA to be primarily a parsing and, secondly, an optimization problem. The proposed methodology addresses the problem of predicting pseudoknots of the first order (H-type). We introduce a context-free grammar (CFG) that affords enough expression power to recognize potential pseudoknot pattern. In addition, an alternative methodology of detecting possible pseudoknots is also implemented as well, using a brute-force algorithm. Any input sequence may highlight multiple potential folding patterns requiring a strict methodology to determine the single biologically realistic one. We conscripted a novel heuristic over the widely accepted notion of free-energy minimization to tackle such ambiguity in a performant way by utilizing each pattern’s context to unveil the most prominent pseudoknot pattern. The overall process features polynomial-time complexity, while its parallel implementation enhances the end performance, as proportional to the deployed hardware. The proposed methodology does succeed in predicting the core stems of any RNA pseudoknot of the test dataset by performing a 76.4% recall ratio. The methodology achieved a F1-score equal to 0.774 and MCC equal 0.543 in discovering all the stems of an RNA sequence, outperforming the particular task. Measurements were taken using a dataset of 262 RNA sequences establishing a performance speed of 1.31, 3.45, and 7.76 compared to three well-known platforms. The implementation source code is publicly available under knotify github repo.
first_indexed 2024-03-09T21:19:54Z
format Article
id doaj.art-8b2ebcd2820142a48c94877c9e6841b1
institution Directory Open Access Journal
issn 2409-9279
language English
last_indexed 2024-03-09T21:19:54Z
publishDate 2022-02-01
publisher MDPI AG
record_format Article
series Methods and Protocols
spelling doaj.art-8b2ebcd2820142a48c94877c9e6841b12023-11-23T21:24:29ZengMDPI AGMethods and Protocols2409-92792022-02-01511410.3390/mps5010014Knotify: An Efficient Parallel Platform for RNA Pseudoknot Prediction Using Syntactic Pattern RecognitionChristos Andrikos0Evangelos Makris1Angelos Kolaitis2Georgios Rassias3Christos Pavlatos4Panayiotis Tsanakas5School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, GreeceSchool of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, GreeceSchool of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, GreeceSchool of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, GreeceHellenic Air Force Academy, Dekelia Air Base, Acharnes, 13671 Athens, GreeceSchool of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, GreeceObtaining valuable clues for noncoding RNA (ribonucleic acid) subsequences remains a significant challenge, acknowledging that most of the human genome transcribes into noncoding RNA parts related to unknown biological operations. Capturing these clues relies on accurate “base pairing” prediction, also known as “RNA secondary structure prediction”. As COVID-19 is considered a severe global threat, the single-stranded SARS-CoV-2 virus reveals the importance of establishing an efficient RNA analysis toolkit. This work aimed to contribute to that by introducing a novel system committed to predicting RNA secondary structure patterns (i.e., RNA’s pseudoknots) that leverage syntactic pattern-recognition strategies. Having focused on the pseudoknot predictions, we formalized the secondary structure prediction of the RNA to be primarily a parsing and, secondly, an optimization problem. The proposed methodology addresses the problem of predicting pseudoknots of the first order (H-type). We introduce a context-free grammar (CFG) that affords enough expression power to recognize potential pseudoknot pattern. In addition, an alternative methodology of detecting possible pseudoknots is also implemented as well, using a brute-force algorithm. Any input sequence may highlight multiple potential folding patterns requiring a strict methodology to determine the single biologically realistic one. We conscripted a novel heuristic over the widely accepted notion of free-energy minimization to tackle such ambiguity in a performant way by utilizing each pattern’s context to unveil the most prominent pseudoknot pattern. The overall process features polynomial-time complexity, while its parallel implementation enhances the end performance, as proportional to the deployed hardware. The proposed methodology does succeed in predicting the core stems of any RNA pseudoknot of the test dataset by performing a 76.4% recall ratio. The methodology achieved a F1-score equal to 0.774 and MCC equal 0.543 in discovering all the stems of an RNA sequence, outperforming the particular task. Measurements were taken using a dataset of 262 RNA sequences establishing a performance speed of 1.31, 3.45, and 7.76 compared to three well-known platforms. The implementation source code is publicly available under knotify github repo.https://www.mdpi.com/2409-9279/5/1/14RNA secondary structurepseudoknotsyntactic pattern recognitioncontext-free grammar
spellingShingle Christos Andrikos
Evangelos Makris
Angelos Kolaitis
Georgios Rassias
Christos Pavlatos
Panayiotis Tsanakas
Knotify: An Efficient Parallel Platform for RNA Pseudoknot Prediction Using Syntactic Pattern Recognition
Methods and Protocols
RNA secondary structure
pseudoknot
syntactic pattern recognition
context-free grammar
title Knotify: An Efficient Parallel Platform for RNA Pseudoknot Prediction Using Syntactic Pattern Recognition
title_full Knotify: An Efficient Parallel Platform for RNA Pseudoknot Prediction Using Syntactic Pattern Recognition
title_fullStr Knotify: An Efficient Parallel Platform for RNA Pseudoknot Prediction Using Syntactic Pattern Recognition
title_full_unstemmed Knotify: An Efficient Parallel Platform for RNA Pseudoknot Prediction Using Syntactic Pattern Recognition
title_short Knotify: An Efficient Parallel Platform for RNA Pseudoknot Prediction Using Syntactic Pattern Recognition
title_sort knotify an efficient parallel platform for rna pseudoknot prediction using syntactic pattern recognition
topic RNA secondary structure
pseudoknot
syntactic pattern recognition
context-free grammar
url https://www.mdpi.com/2409-9279/5/1/14
work_keys_str_mv AT christosandrikos knotifyanefficientparallelplatformforrnapseudoknotpredictionusingsyntacticpatternrecognition
AT evangelosmakris knotifyanefficientparallelplatformforrnapseudoknotpredictionusingsyntacticpatternrecognition
AT angeloskolaitis knotifyanefficientparallelplatformforrnapseudoknotpredictionusingsyntacticpatternrecognition
AT georgiosrassias knotifyanefficientparallelplatformforrnapseudoknotpredictionusingsyntacticpatternrecognition
AT christospavlatos knotifyanefficientparallelplatformforrnapseudoknotpredictionusingsyntacticpatternrecognition
AT panayiotistsanakas knotifyanefficientparallelplatformforrnapseudoknotpredictionusingsyntacticpatternrecognition