Computational dissection of Arabidopsis smRNAome leads to discovery of novel microRNAs and short interfering RNAs associated with transcription start sites

The profiling of small RNAs by high-throughput sequencing (smRNA-Seq) has revealed the complexity of the RNA world. Here, we describe a computational scheme for dissecting the plant smRNAome by integrating smRNA-Seq datasets in Arabidopsis thaliana. Our analytical approach first defines ab initio th...

Full description

Bibliographic Details
Main Authors: Wang, Xiangfeng, Laurie, John D., Liu, Tao, Wentz, Jacqueline, Liu, X. Shirley
Other Authors: Massachusetts Institute of Technology. Department of Biological Engineering
Format: Article
Language:en_US
Published: Elsevier 2014
Online Access:http://hdl.handle.net/1721.1/92050
Description
Summary:The profiling of small RNAs by high-throughput sequencing (smRNA-Seq) has revealed the complexity of the RNA world. Here, we describe a computational scheme for dissecting the plant smRNAome by integrating smRNA-Seq datasets in Arabidopsis thaliana. Our analytical approach first defines ab initio the genomic loci that produce smRNAs as basic units, then utilizes principal component analysis (PCA) to predict novel miRNAs. Secondary structure prediction of candidates' putative precursors discovered a group of long hairpin double-stranded RNAs (lh-dsRNAs) formed by inverted duplications of decayed coding genes. These gene remnants produce miRNA-like small RNAs which are predominantly 21- and 22-nt long, dependent of DCL1 but independent of RDR2 and DCL2/3/4, and associated with AGO1. Additionally, we found two classes of transcription start site associated (TSSa) RNAs located at sense (+) and antisense (−) approximately 100–200 bp downstream of TSSs, but are differentially incorporated into AGO1 and AGO4, respectively.