Characterization of paralogous protein families in rice

Background: High gene numbers in plant genomes reflect polyploidy and major gene duplication events. Oryza sativa, cultivated rice, is a diploid monocotyledonous species with a ~390 Mb genome that has undergone segmental duplication of a substantial portion of its genome. This, coupled with other ge...

Full description

Bibliographic Details
Main Authors: Lin, Haining, Ouyang, Shu, Egan, Amy, Nobuta, Kan, Zhu, Wei, Gu, Xun, Silva, Joana C, Meyers, Blake C, Buell, C. Robin, Haas, Brian J.
Other Authors: Broad Institute of MIT and Harvard
Format: Article
Language:English
Published: BioMed Central Ltd 2010
Online Access:http://hdl.handle.net/1721.1/58916
_version_ 1826190002235637760
author Lin, Haining
Ouyang, Shu
Egan, Amy
Nobuta, Kan
Zhu, Wei
Gu, Xun
Silva, Joana C
Meyers, Blake C
Buell, C. Robin
Haas, Brian J.
author2 Broad Institute of MIT and Harvard
author_facet Broad Institute of MIT and Harvard
Lin, Haining
Ouyang, Shu
Egan, Amy
Nobuta, Kan
Zhu, Wei
Gu, Xun
Silva, Joana C
Meyers, Blake C
Buell, C. Robin
Haas, Brian J.
author_sort Lin, Haining
collection MIT
description Background: High gene numbers in plant genomes reflect polyploidy and major gene duplication events. Oryza sativa, cultivated rice, is a diploid monocotyledonous species with a ~390 Mb genome that has undergone segmental duplication of a substantial portion of its genome. This, coupled with other genetic events such as tandem duplications, has resulted in a substantial number of its genes, and resulting proteins, occurring in paralogous families. Results: Using a computational pipeline that utilizes Pfam and novel protein domains, we characterized paralogous families in rice and compared these with paralogous families in the model dicotyledonous diploid species, Arabidopsis thaliana. Arabidopsis, which has undergone genome duplication as well, has a substantially smaller genome (~120 Mb) and gene complement compared to rice. Overall, 53% and 68% of the non-transposable element-related rice and Arabidopsis proteins could be classified into paralogous protein families, respectively. Singleton and paralogous family genes differed substantially in their likelihood of encoding a protein of known or putative function; 26% and 66% of singleton genes compared to 73% and 96% of the paralogous family genes encode a known or putative protein in rice and Arabidopsis, respectively. Furthermore, a major skew in the distribution of specific gene function was observed; a total of 17 Gene Ontology categories in both rice and Arabidopsis were statistically significant in their differential distribution between paralogous family and singleton proteins. In contrast to mammalian organisms, we found that duplicated genes in rice and Arabidopsis tend to have more alternative splice forms. Using data from Massively Parallel Signature Sequencing, we show that a significant portion of the duplicated genes in rice show divergent expression although a correlation between sequence divergence and correlation of expression could be seen in very young genes. Conclusion: Collectively, these data suggest that while co-regulation and conserved function are present in some paralogous protein family members, evolutionary pressures have resulted in functional divergence with differential expression patterns.
first_indexed 2024-09-23T08:33:31Z
format Article
id mit-1721.1/58916
institution Massachusetts Institute of Technology
language English
last_indexed 2024-09-23T08:33:31Z
publishDate 2010
publisher BioMed Central Ltd
record_format dspace
spelling mit-1721.1/589162024-07-19T20:04:04Z Characterization of paralogous protein families in rice Lin, Haining Ouyang, Shu Egan, Amy Nobuta, Kan Zhu, Wei Gu, Xun Silva, Joana C Meyers, Blake C Buell, C. Robin Haas, Brian J. Broad Institute of MIT and Harvard Haas, Brian J. Background: High gene numbers in plant genomes reflect polyploidy and major gene duplication events. Oryza sativa, cultivated rice, is a diploid monocotyledonous species with a ~390 Mb genome that has undergone segmental duplication of a substantial portion of its genome. This, coupled with other genetic events such as tandem duplications, has resulted in a substantial number of its genes, and resulting proteins, occurring in paralogous families. Results: Using a computational pipeline that utilizes Pfam and novel protein domains, we characterized paralogous families in rice and compared these with paralogous families in the model dicotyledonous diploid species, Arabidopsis thaliana. Arabidopsis, which has undergone genome duplication as well, has a substantially smaller genome (~120 Mb) and gene complement compared to rice. Overall, 53% and 68% of the non-transposable element-related rice and Arabidopsis proteins could be classified into paralogous protein families, respectively. Singleton and paralogous family genes differed substantially in their likelihood of encoding a protein of known or putative function; 26% and 66% of singleton genes compared to 73% and 96% of the paralogous family genes encode a known or putative protein in rice and Arabidopsis, respectively. Furthermore, a major skew in the distribution of specific gene function was observed; a total of 17 Gene Ontology categories in both rice and Arabidopsis were statistically significant in their differential distribution between paralogous family and singleton proteins. In contrast to mammalian organisms, we found that duplicated genes in rice and Arabidopsis tend to have more alternative splice forms. Using data from Massively Parallel Signature Sequencing, we show that a significant portion of the duplicated genes in rice show divergent expression although a correlation between sequence divergence and correlation of expression could be seen in very young genes. Conclusion: Collectively, these data suggest that while co-regulation and conserved function are present in some paralogous protein family members, evolutionary pressures have resulted in functional divergence with differential expression patterns. National Science Foundation (U.S.). Plant Genome Research Program (DBI-0321538) National Science Foundation (U.S.) (DBI-0321437) 2010-10-06T19:21:52Z 2010-10-06T19:21:52Z 2008-02 2007-05 2010-09-03T16:22:45Z Article http://purl.org/eprint/type/JournalArticle 1471-2229 http://hdl.handle.net/1721.1/58916 BMC Plant Biology. 2008 Feb 19;8(1):18 en http://dx.doi.org/10.1186/1471-2229-8-18 BMC Plant Biology Creative Commons Attribution http://creativecommons.org/licenses/by/2.0 Lin et al.; licensee BioMed Central Ltd. application/pdf BioMed Central Ltd BioMed Central Ltd
spellingShingle Lin, Haining
Ouyang, Shu
Egan, Amy
Nobuta, Kan
Zhu, Wei
Gu, Xun
Silva, Joana C
Meyers, Blake C
Buell, C. Robin
Haas, Brian J.
Characterization of paralogous protein families in rice
title Characterization of paralogous protein families in rice
title_full Characterization of paralogous protein families in rice
title_fullStr Characterization of paralogous protein families in rice
title_full_unstemmed Characterization of paralogous protein families in rice
title_short Characterization of paralogous protein families in rice
title_sort characterization of paralogous protein families in rice
url http://hdl.handle.net/1721.1/58916
work_keys_str_mv AT linhaining characterizationofparalogousproteinfamiliesinrice
AT ouyangshu characterizationofparalogousproteinfamiliesinrice
AT eganamy characterizationofparalogousproteinfamiliesinrice
AT nobutakan characterizationofparalogousproteinfamiliesinrice
AT zhuwei characterizationofparalogousproteinfamiliesinrice
AT guxun characterizationofparalogousproteinfamiliesinrice
AT silvajoanac characterizationofparalogousproteinfamiliesinrice
AT meyersblakec characterizationofparalogousproteinfamiliesinrice
AT buellcrobin characterizationofparalogousproteinfamiliesinrice
AT haasbrianj characterizationofparalogousproteinfamiliesinrice