Performance and scalability of discriminative metrics for comparative gene identification in 12 Drosophila genomes.

Comparative genomics of multiple related species is a powerful methodology for the discovery of functional genomic elements, and its power should increase with the number of species compared. Here, we use 12 Drosophila genomes to study the power of comparative genomics metrics to distinguish between...

Full description

Bibliographic Details
Main Authors: Michael F Lin, Ameya N Deoras, Matthew D Rasmussen, Manolis Kellis
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2008-04-01
Series:PLoS Computational Biology
Online Access:http://europepmc.org/articles/PMC2291194?pdf=render
_version_ 1811223113189294080
author Michael F Lin
Ameya N Deoras
Matthew D Rasmussen
Manolis Kellis
author_facet Michael F Lin
Ameya N Deoras
Matthew D Rasmussen
Manolis Kellis
author_sort Michael F Lin
collection DOAJ
description Comparative genomics of multiple related species is a powerful methodology for the discovery of functional genomic elements, and its power should increase with the number of species compared. Here, we use 12 Drosophila genomes to study the power of comparative genomics metrics to distinguish between protein-coding and non-coding regions. First, we study the relative power of different comparative metrics and their relationship to single-species metrics. We find that even relatively simple multi-species metrics robustly outperform advanced single-species metrics, especially for shorter exons (< or =240 nt), which are common in animal genomes. Moreover, the two capture largely independent features of protein-coding genes, with different sensitivity/specificity trade-offs, such that their combinations lead to even greater discriminatory power. In addition, we study how discovery power scales with the number and phylogenetic distance of the genomes compared. We find that species at a broad range of distances are comparably effective informants for pairwise comparative gene identification, but that these are surpassed by multi-species comparisons at similar evolutionary divergence. In particular, while pairwise discovery power plateaued at larger distances and never outperformed the most advanced single-species metrics, multi-species comparisons continued to benefit even from the most distant species with no apparent saturation. Last, we find that genes in functional categories typically considered fast-evolving can nonetheless be recovered at very high rates using comparative methods. Our results have implications for comparative genomics analyses in any species, including the human.
first_indexed 2024-04-12T08:27:31Z
format Article
id doaj.art-7488ca7e74644f31871ab1e92bc3c4d0
institution Directory Open Access Journal
issn 1553-734X
1553-7358
language English
last_indexed 2024-04-12T08:27:31Z
publishDate 2008-04-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Computational Biology
spelling doaj.art-7488ca7e74644f31871ab1e92bc3c4d02022-12-22T03:40:19ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582008-04-0144e100006710.1371/journal.pcbi.1000067Performance and scalability of discriminative metrics for comparative gene identification in 12 Drosophila genomes.Michael F LinAmeya N DeorasMatthew D RasmussenManolis KellisComparative genomics of multiple related species is a powerful methodology for the discovery of functional genomic elements, and its power should increase with the number of species compared. Here, we use 12 Drosophila genomes to study the power of comparative genomics metrics to distinguish between protein-coding and non-coding regions. First, we study the relative power of different comparative metrics and their relationship to single-species metrics. We find that even relatively simple multi-species metrics robustly outperform advanced single-species metrics, especially for shorter exons (< or =240 nt), which are common in animal genomes. Moreover, the two capture largely independent features of protein-coding genes, with different sensitivity/specificity trade-offs, such that their combinations lead to even greater discriminatory power. In addition, we study how discovery power scales with the number and phylogenetic distance of the genomes compared. We find that species at a broad range of distances are comparably effective informants for pairwise comparative gene identification, but that these are surpassed by multi-species comparisons at similar evolutionary divergence. In particular, while pairwise discovery power plateaued at larger distances and never outperformed the most advanced single-species metrics, multi-species comparisons continued to benefit even from the most distant species with no apparent saturation. Last, we find that genes in functional categories typically considered fast-evolving can nonetheless be recovered at very high rates using comparative methods. Our results have implications for comparative genomics analyses in any species, including the human.http://europepmc.org/articles/PMC2291194?pdf=render
spellingShingle Michael F Lin
Ameya N Deoras
Matthew D Rasmussen
Manolis Kellis
Performance and scalability of discriminative metrics for comparative gene identification in 12 Drosophila genomes.
PLoS Computational Biology
title Performance and scalability of discriminative metrics for comparative gene identification in 12 Drosophila genomes.
title_full Performance and scalability of discriminative metrics for comparative gene identification in 12 Drosophila genomes.
title_fullStr Performance and scalability of discriminative metrics for comparative gene identification in 12 Drosophila genomes.
title_full_unstemmed Performance and scalability of discriminative metrics for comparative gene identification in 12 Drosophila genomes.
title_short Performance and scalability of discriminative metrics for comparative gene identification in 12 Drosophila genomes.
title_sort performance and scalability of discriminative metrics for comparative gene identification in 12 drosophila genomes
url http://europepmc.org/articles/PMC2291194?pdf=render
work_keys_str_mv AT michaelflin performanceandscalabilityofdiscriminativemetricsforcomparativegeneidentificationin12drosophilagenomes
AT ameyandeoras performanceandscalabilityofdiscriminativemetricsforcomparativegeneidentificationin12drosophilagenomes
AT matthewdrasmussen performanceandscalabilityofdiscriminativemetricsforcomparativegeneidentificationin12drosophilagenomes
AT manoliskellis performanceandscalabilityofdiscriminativemetricsforcomparativegeneidentificationin12drosophilagenomes