Locating protein-coding sequences under selection for additional, overlapping functions in 29 mammalian genomes

The degeneracy of the genetic code allows protein-coding DNA and RNA sequences to simultaneously encode additional, overlapping functional elements. A sequence in which both protein-coding and additional overlapping functions have evolved under purifying selection should show increased evolutionary...

Full description

Bibliographic Details
Main Authors: Lin, Michael F., Kheradpour, Pouya, Mag Washietl, Stefan, Parker, Brian J., Pedersen, Jakob S., Kellis, Manolis
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format: Article
Language:en_US
Published: Cold Spring Harbor Laboratory Press 2012
Online Access:http://hdl.handle.net/1721.1/73052
_version_ 1826204866749399040
author Lin, Michael F.
Kheradpour, Pouya
Mag Washietl, Stefan
Parker, Brian J.
Pedersen, Jakob S.
Kellis, Manolis
author2 Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Lin, Michael F.
Kheradpour, Pouya
Mag Washietl, Stefan
Parker, Brian J.
Pedersen, Jakob S.
Kellis, Manolis
author_sort Lin, Michael F.
collection MIT
description The degeneracy of the genetic code allows protein-coding DNA and RNA sequences to simultaneously encode additional, overlapping functional elements. A sequence in which both protein-coding and additional overlapping functions have evolved under purifying selection should show increased evolutionary conservation compared to typical protein-coding genes—especially at synonymous sites. In this study, we use genome alignments of 29 placental mammals to systematically locate short regions within human ORFs that show conspicuously low estimated rates of synonymous substitution across these species. The 29-species alignment provides statistical power to locate more than 10,000 such regions with resolution down to nine-codon windows, which are found within more than a quarter of all human protein-coding genes and contain ∼2% of their synonymous sites. We collect numerous lines of evidence that the observed synonymous constraint in these regions reflects selection on overlapping functional elements including splicing regulatory elements, dual-coding genes, RNA secondary structures, microRNA target sites, and developmental enhancers. Our results show that overlapping functional elements are common in mammalian genes, despite the vast genomic landscape.
first_indexed 2024-09-23T13:02:37Z
format Article
id mit-1721.1/73052
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T13:02:37Z
publishDate 2012
publisher Cold Spring Harbor Laboratory Press
record_format dspace
spelling mit-1721.1/730522022-10-01T12:43:11Z Locating protein-coding sequences under selection for additional, overlapping functions in 29 mammalian genomes Lin, Michael F. Kheradpour, Pouya Mag Washietl, Stefan Parker, Brian J. Pedersen, Jakob S. Kellis, Manolis Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Kellis, Manolis Lin, Michael F. Mag Washietl, Stefan Kellis, Manolis The degeneracy of the genetic code allows protein-coding DNA and RNA sequences to simultaneously encode additional, overlapping functional elements. A sequence in which both protein-coding and additional overlapping functions have evolved under purifying selection should show increased evolutionary conservation compared to typical protein-coding genes—especially at synonymous sites. In this study, we use genome alignments of 29 placental mammals to systematically locate short regions within human ORFs that show conspicuously low estimated rates of synonymous substitution across these species. The 29-species alignment provides statistical power to locate more than 10,000 such regions with resolution down to nine-codon windows, which are found within more than a quarter of all human protein-coding genes and contain ∼2% of their synonymous sites. We collect numerous lines of evidence that the observed synonymous constraint in these regions reflects selection on overlapping functional elements including splicing regulatory elements, dual-coding genes, RNA secondary structures, microRNA target sites, and developmental enhancers. Our results show that overlapping functional elements are common in mammalian genes, despite the vast genomic landscape. National Science Foundation (U.S.) (DBI 0644282) National Institutes of Health (U.S.) (U54 HG004555-01) 2012-09-19T17:56:09Z 2012-09-19T17:56:09Z 2011-10 2010-04 Article http://purl.org/eprint/type/JournalArticle 1088-9051 http://hdl.handle.net/1721.1/73052 Lin, M. F. et al. “Locating Protein-coding Sequences Under Selection for Additional, Overlapping Functions in 29 Mammalian Genomes.” Genome Research 21.11 (2011): 1916–1928. © 2011 by Cold Spring Harbor Laboratory Press en_US http://dx.doi.org/10.1101/gr.108753.110 Genome Research Creative Commons Attribution-NonCommercial 3.0 Unported License http://creativecommons.org/licenses/by-nc/3.0/ application/pdf Cold Spring Harbor Laboratory Press Genome Research
spellingShingle Lin, Michael F.
Kheradpour, Pouya
Mag Washietl, Stefan
Parker, Brian J.
Pedersen, Jakob S.
Kellis, Manolis
Locating protein-coding sequences under selection for additional, overlapping functions in 29 mammalian genomes
title Locating protein-coding sequences under selection for additional, overlapping functions in 29 mammalian genomes
title_full Locating protein-coding sequences under selection for additional, overlapping functions in 29 mammalian genomes
title_fullStr Locating protein-coding sequences under selection for additional, overlapping functions in 29 mammalian genomes
title_full_unstemmed Locating protein-coding sequences under selection for additional, overlapping functions in 29 mammalian genomes
title_short Locating protein-coding sequences under selection for additional, overlapping functions in 29 mammalian genomes
title_sort locating protein coding sequences under selection for additional overlapping functions in 29 mammalian genomes
url http://hdl.handle.net/1721.1/73052
work_keys_str_mv AT linmichaelf locatingproteincodingsequencesunderselectionforadditionaloverlappingfunctionsin29mammaliangenomes
AT kheradpourpouya locatingproteincodingsequencesunderselectionforadditionaloverlappingfunctionsin29mammaliangenomes
AT magwashietlstefan locatingproteincodingsequencesunderselectionforadditionaloverlappingfunctionsin29mammaliangenomes
AT parkerbrianj locatingproteincodingsequencesunderselectionforadditionaloverlappingfunctionsin29mammaliangenomes
AT pedersenjakobs locatingproteincodingsequencesunderselectionforadditionaloverlappingfunctionsin29mammaliangenomes
AT kellismanolis locatingproteincodingsequencesunderselectionforadditionaloverlappingfunctionsin29mammaliangenomes