Methods for selecting fixed-effect models for heterogeneous codon evolution, with comments on their application to gene and genome data

<p>Abstract</p> <p>Background</p> <p>Models of codon evolution have proven useful for investigating the strength and direction of natural selection. In some cases, <it>a priori </it>biological knowledge has been used successfully to model heterogeneous evolu...

Full description

Bibliographic Details
Main Authors: Dunn Katherine A, Gu Hong, Bao Le, Bielawski Joseph P
Format: Article
Language:English
Published: BMC 2007-02-01
Series:BMC Evolutionary Biology
_version_ 1818461601258274816
author Dunn Katherine A
Gu Hong
Bao Le
Bielawski Joseph P
author_facet Dunn Katherine A
Gu Hong
Bao Le
Bielawski Joseph P
author_sort Dunn Katherine A
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>Models of codon evolution have proven useful for investigating the strength and direction of natural selection. In some cases, <it>a priori </it>biological knowledge has been used successfully to model heterogeneous evolutionary dynamics among codon sites. These are called fixed-effect models, and they require that all codon sites are assigned to one of several partitions which are permitted to have independent parameters for selection pressure, evolutionary rate, transition to transversion ratio or codon frequencies. For single gene analysis, partitions might be defined according to protein tertiary structure, and for multiple gene analysis partitions might be defined according to a gene's functional category. Given a set of related fixed-effect models, the task of selecting the model that best fits the data is not trivial.</p> <p>Results</p> <p>In this study, we implement a set of fixed-effect codon models which allow for different levels of heterogeneity among partitions in the substitution process. We describe strategies for selecting among these models by a backward elimination procedure, Akaike information criterion (AIC) or a corrected Akaike information criterion (AICc). We evaluate the performance of these model selection methods via a simulation study, and make several recommendations for real data analysis. Our simulation study indicates that the backward elimination procedure can provide a reliable method for model selection in this setting. We also demonstrate the utility of these models by application to a single-gene dataset partitioned according to tertiary structure (abalone sperm lysin), and a multi-gene dataset partitioned according to the functional category of the gene (flagellar-related proteins of <it>Listeria</it>).</p> <p>Conclusion</p> <p>Fixed-effect models have advantages and disadvantages. Fixed-effect models are desirable when data partitions are known to exhibit significant heterogeneity or when a statistical test of such heterogeneity is desired. They have the disadvantage of requiring <it>a priori </it>knowledge for partitioning sites. We recommend: (i) selection of models by using backward elimination rather than AIC or AICc, (ii) use a stringent cut-off, <it>e.g.</it>, <it>p </it>= 0.0001, and (iii) conduct sensitivity analysis of results. With thoughtful application, fixed-effect codon models should provide a useful tool for large scale multi-gene analyses.</p>
first_indexed 2024-12-14T23:48:44Z
format Article
id doaj.art-e80d865d122d4af4bfde693af6f90359
institution Directory Open Access Journal
issn 1471-2148
language English
last_indexed 2024-12-14T23:48:44Z
publishDate 2007-02-01
publisher BMC
record_format Article
series BMC Evolutionary Biology
spelling doaj.art-e80d865d122d4af4bfde693af6f903592022-12-21T22:43:18ZengBMCBMC Evolutionary Biology1471-21482007-02-017Suppl 1S510.1186/1471-2148-7-S1-S5Methods for selecting fixed-effect models for heterogeneous codon evolution, with comments on their application to gene and genome dataDunn Katherine AGu HongBao LeBielawski Joseph P<p>Abstract</p> <p>Background</p> <p>Models of codon evolution have proven useful for investigating the strength and direction of natural selection. In some cases, <it>a priori </it>biological knowledge has been used successfully to model heterogeneous evolutionary dynamics among codon sites. These are called fixed-effect models, and they require that all codon sites are assigned to one of several partitions which are permitted to have independent parameters for selection pressure, evolutionary rate, transition to transversion ratio or codon frequencies. For single gene analysis, partitions might be defined according to protein tertiary structure, and for multiple gene analysis partitions might be defined according to a gene's functional category. Given a set of related fixed-effect models, the task of selecting the model that best fits the data is not trivial.</p> <p>Results</p> <p>In this study, we implement a set of fixed-effect codon models which allow for different levels of heterogeneity among partitions in the substitution process. We describe strategies for selecting among these models by a backward elimination procedure, Akaike information criterion (AIC) or a corrected Akaike information criterion (AICc). We evaluate the performance of these model selection methods via a simulation study, and make several recommendations for real data analysis. Our simulation study indicates that the backward elimination procedure can provide a reliable method for model selection in this setting. We also demonstrate the utility of these models by application to a single-gene dataset partitioned according to tertiary structure (abalone sperm lysin), and a multi-gene dataset partitioned according to the functional category of the gene (flagellar-related proteins of <it>Listeria</it>).</p> <p>Conclusion</p> <p>Fixed-effect models have advantages and disadvantages. Fixed-effect models are desirable when data partitions are known to exhibit significant heterogeneity or when a statistical test of such heterogeneity is desired. They have the disadvantage of requiring <it>a priori </it>knowledge for partitioning sites. We recommend: (i) selection of models by using backward elimination rather than AIC or AICc, (ii) use a stringent cut-off, <it>e.g.</it>, <it>p </it>= 0.0001, and (iii) conduct sensitivity analysis of results. With thoughtful application, fixed-effect codon models should provide a useful tool for large scale multi-gene analyses.</p>
spellingShingle Dunn Katherine A
Gu Hong
Bao Le
Bielawski Joseph P
Methods for selecting fixed-effect models for heterogeneous codon evolution, with comments on their application to gene and genome data
BMC Evolutionary Biology
title Methods for selecting fixed-effect models for heterogeneous codon evolution, with comments on their application to gene and genome data
title_full Methods for selecting fixed-effect models for heterogeneous codon evolution, with comments on their application to gene and genome data
title_fullStr Methods for selecting fixed-effect models for heterogeneous codon evolution, with comments on their application to gene and genome data
title_full_unstemmed Methods for selecting fixed-effect models for heterogeneous codon evolution, with comments on their application to gene and genome data
title_short Methods for selecting fixed-effect models for heterogeneous codon evolution, with comments on their application to gene and genome data
title_sort methods for selecting fixed effect models for heterogeneous codon evolution with comments on their application to gene and genome data
work_keys_str_mv AT dunnkatherinea methodsforselectingfixedeffectmodelsforheterogeneouscodonevolutionwithcommentsontheirapplicationtogeneandgenomedata
AT guhong methodsforselectingfixedeffectmodelsforheterogeneouscodonevolutionwithcommentsontheirapplicationtogeneandgenomedata
AT baole methodsforselectingfixedeffectmodelsforheterogeneouscodonevolutionwithcommentsontheirapplicationtogeneandgenomedata
AT bielawskijosephp methodsforselectingfixedeffectmodelsforheterogeneouscodonevolutionwithcommentsontheirapplicationtogeneandgenomedata