PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions
Motivation: As high-throughput transcriptome sequencing provides evidence for novel transcripts in many species, there is a renewed need for accurate methods to classify small genomic regions as protein coding or non-coding. We present PhyloCSF, a novel comparative genomics method that analyzes a mu...
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | en_US |
Published: |
Oxford University Press
2012
|
Online Access: | http://hdl.handle.net/1721.1/72566 |
_version_ | 1826202404120428544 |
---|---|
author | Lin, Michael F. Jungreis, Irwin Kellis, Manolis |
author2 | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory |
author_facet | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Lin, Michael F. Jungreis, Irwin Kellis, Manolis |
author_sort | Lin, Michael F. |
collection | MIT |
description | Motivation: As high-throughput transcriptome sequencing provides evidence for novel transcripts in many species, there is a renewed need for accurate methods to classify small genomic regions as protein coding or non-coding. We present PhyloCSF, a novel comparative genomics method that analyzes a multispecies nucleotide sequence alignment to determine whether it is likely to represent a conserved protein-coding region, based on a formal statistical comparison of phylogenetic codon models.
Results: We show that PhyloCSF's classification performance in 12-species Drosophila genome alignments exceeds all other methods we compared in a previous study. We anticipate that this method will be widely applicable as the transcriptomes of many additional species, tissues and subcellular compartments are sequenced, particularly in the context of ENCODE and modENCODE, and as interest grows in long non-coding RNAs, often initially recognized by their lack of protein coding potential rather than conserved RNA secondary structures. |
first_indexed | 2024-09-23T12:06:55Z |
format | Article |
id | mit-1721.1/72566 |
institution | Massachusetts Institute of Technology |
language | en_US |
last_indexed | 2024-09-23T12:06:55Z |
publishDate | 2012 |
publisher | Oxford University Press |
record_format | dspace |
spelling | mit-1721.1/725662022-10-01T08:16:39Z PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions Lin, Michael F. Jungreis, Irwin Kellis, Manolis Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Kellis, Manolis Kellis, Manolis Lin, Michael F. Jungreis, Irwin Motivation: As high-throughput transcriptome sequencing provides evidence for novel transcripts in many species, there is a renewed need for accurate methods to classify small genomic regions as protein coding or non-coding. We present PhyloCSF, a novel comparative genomics method that analyzes a multispecies nucleotide sequence alignment to determine whether it is likely to represent a conserved protein-coding region, based on a formal statistical comparison of phylogenetic codon models. Results: We show that PhyloCSF's classification performance in 12-species Drosophila genome alignments exceeds all other methods we compared in a previous study. We anticipate that this method will be widely applicable as the transcriptomes of many additional species, tissues and subcellular compartments are sequenced, particularly in the context of ENCODE and modENCODE, and as interest grows in long non-coding RNAs, often initially recognized by their lack of protein coding potential rather than conserved RNA secondary structures. National Institutes of Health (U.S.) (U54 HG004555-01) National Science Foundation (U.S.) (DBI 0644282) 2012-09-07T15:10:16Z 2012-09-07T15:10:16Z 2011-07 Article http://purl.org/eprint/type/JournalArticle 1460-2059 1367-4803 PMC3117382 http://hdl.handle.net/1721.1/72566 Lin, Michael F., Irwin Jungreis, and Manolis Kellis. “PhyloCSF: a Comparative Genomics Method to Distinguish Protein Coding and Non-coding Regions.” Bioinformatics 27.13 (2011): i275–i282. Web. en_US http://dx.doi.org/10.1093/bioinformatics/btr209 Bioinformatics Creative Commons Attribution Non-Commercial http://creativecommons.org/licenses/by-nc/2.5 application/pdf Oxford University Press Oxford |
spellingShingle | Lin, Michael F. Jungreis, Irwin Kellis, Manolis PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions |
title | PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions |
title_full | PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions |
title_fullStr | PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions |
title_full_unstemmed | PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions |
title_short | PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions |
title_sort | phylocsf a comparative genomics method to distinguish protein coding and non coding regions |
url | http://hdl.handle.net/1721.1/72566 |
work_keys_str_mv | AT linmichaelf phylocsfacomparativegenomicsmethodtodistinguishproteincodingandnoncodingregions AT jungreisirwin phylocsfacomparativegenomicsmethodtodistinguishproteincodingandnoncodingregions AT kellismanolis phylocsfacomparativegenomicsmethodtodistinguishproteincodingandnoncodingregions |