Use of artificial genomes in assessing methods for atypical gene detection.

Parametric methods for identifying laterally transferred genes exploit the directional mutational biases unique to each genome. Yet the development of new, more robust methods--as well as the evaluation and proper implementation of existing methods--relies on an arbitrary assessment of performance u...

Full description

Bibliographic Details
Main Authors: Rajeev K Azad, Jeffrey G Lawrence
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2005-11-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.0010056
_version_ 1819020312306515968
author Rajeev K Azad
Jeffrey G Lawrence
author_facet Rajeev K Azad
Jeffrey G Lawrence
author_sort Rajeev K Azad
collection DOAJ
description Parametric methods for identifying laterally transferred genes exploit the directional mutational biases unique to each genome. Yet the development of new, more robust methods--as well as the evaluation and proper implementation of existing methods--relies on an arbitrary assessment of performance using real genomes, where the evolutionary histories of genes are not known. We have used the framework of a generalized hidden Markov model to create artificial genomes modeled after genuine genomes. To model a genome, "core" genes--those displaying patterns of mutational biases shared among large numbers of genes--are identified by a novel gene clustering approach based on the Akaike information criterion. Gene models derived from multiple "core" gene clusters are used to generate an artificial genome that models the properties of a genuine genome. Chimeric artificial genomes--representing those having experienced lateral gene transfer--were created by combining genes from multiple artificial genomes, and the performance of the parametric methods for identifying "atypical" genes was assessed directly. We found that a hidden Markov model that included multiple gene models, each trained on sets of genes representing the range of genotypic variability within a genome, could produce artificial genomes that mimicked the properties of genuine genomes. Moreover, different methods for detecting foreign genes performed differently--i.e., they had different sets of strengths and weaknesses--when identifying atypical genes within chimeric artificial genomes.
first_indexed 2024-12-21T03:49:12Z
format Article
id doaj.art-322bc778f5ea4eefb0b9a3f3a89bb8c0
institution Directory Open Access Journal
issn 1553-734X
1553-7358
language English
last_indexed 2024-12-21T03:49:12Z
publishDate 2005-11-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Computational Biology
spelling doaj.art-322bc778f5ea4eefb0b9a3f3a89bb8c02022-12-21T19:17:01ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582005-11-0116e5610.1371/journal.pcbi.0010056Use of artificial genomes in assessing methods for atypical gene detection.Rajeev K AzadJeffrey G LawrenceParametric methods for identifying laterally transferred genes exploit the directional mutational biases unique to each genome. Yet the development of new, more robust methods--as well as the evaluation and proper implementation of existing methods--relies on an arbitrary assessment of performance using real genomes, where the evolutionary histories of genes are not known. We have used the framework of a generalized hidden Markov model to create artificial genomes modeled after genuine genomes. To model a genome, "core" genes--those displaying patterns of mutational biases shared among large numbers of genes--are identified by a novel gene clustering approach based on the Akaike information criterion. Gene models derived from multiple "core" gene clusters are used to generate an artificial genome that models the properties of a genuine genome. Chimeric artificial genomes--representing those having experienced lateral gene transfer--were created by combining genes from multiple artificial genomes, and the performance of the parametric methods for identifying "atypical" genes was assessed directly. We found that a hidden Markov model that included multiple gene models, each trained on sets of genes representing the range of genotypic variability within a genome, could produce artificial genomes that mimicked the properties of genuine genomes. Moreover, different methods for detecting foreign genes performed differently--i.e., they had different sets of strengths and weaknesses--when identifying atypical genes within chimeric artificial genomes.https://doi.org/10.1371/journal.pcbi.0010056
spellingShingle Rajeev K Azad
Jeffrey G Lawrence
Use of artificial genomes in assessing methods for atypical gene detection.
PLoS Computational Biology
title Use of artificial genomes in assessing methods for atypical gene detection.
title_full Use of artificial genomes in assessing methods for atypical gene detection.
title_fullStr Use of artificial genomes in assessing methods for atypical gene detection.
title_full_unstemmed Use of artificial genomes in assessing methods for atypical gene detection.
title_short Use of artificial genomes in assessing methods for atypical gene detection.
title_sort use of artificial genomes in assessing methods for atypical gene detection
url https://doi.org/10.1371/journal.pcbi.0010056
work_keys_str_mv AT rajeevkazad useofartificialgenomesinassessingmethodsforatypicalgenedetection
AT jeffreyglawrence useofartificialgenomesinassessingmethodsforatypicalgenedetection