ICAnnoLncRNA: A Snakemake Pipeline for a Long Non-Coding-RNA Search and Annotation in Transcriptomic Sequences

Long non-coding RNAs (lncRNAs) are RNA molecules longer than 200 nucleotides that do not encode proteins. Experimental studies have shown the diversity and importance of lncRNA functions in plants. To expand knowledge about lncRNAs in other species, computational pipelines that allow for standardise...

Full description

Bibliographic Details
Main Authors: Artem Yu. Pronozin, Dmitry A. Afonnikov
Format: Article
Language:English
Published: MDPI AG 2023-06-01
Series:Genes
Subjects:
Online Access:https://www.mdpi.com/2073-4425/14/7/1331
Description
Summary:Long non-coding RNAs (lncRNAs) are RNA molecules longer than 200 nucleotides that do not encode proteins. Experimental studies have shown the diversity and importance of lncRNA functions in plants. To expand knowledge about lncRNAs in other species, computational pipelines that allow for standardised data-processing steps in a mode that does not require user control up until the final result were actively developed recently. These advancements enable wider functionality for lncRNA data identification and analysis. In the present work, we propose the ICAnnoLncRNA pipeline for the automatic identification, classification and annotation of plant lncRNAs in assembled transcriptomic sequences. It uses the LncFinder software for the identification of lncRNAs and allows the adjustment of recognition parameters using genomic data for which lncRNA annotation is available. The pipeline allows the prediction of lncRNA candidates, alignment of lncRNA sequences to the reference genome, filtering of erroneous/noise transcripts and probable transposable elements, lncRNA classification by genome location, comparison with sequences from external databases and analysis of lncRNA structural features and expression. We used transcriptomic sequences from 15 maize libraries assembled by Trinity and Hisat2/StringTie to demonstrate the application of the ICAnnoLncRNA pipeline.
ISSN:2073-4425