The impact of gene duplication, insertion, deletion, lateral gene transfer and sequencing error on orthology inference: a simulation study.
The identification of orthologous genes, a prerequisite for numerous analyses in comparative and functional genomics, is commonly performed computationally from protein sequences. Several previous studies have compared the accuracy of orthology inference methods, but simulated data has not typically...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2013-01-01
|
Series: | PLoS ONE |
Online Access: | http://europepmc.org/articles/PMC3581572?pdf=render |
_version_ | 1811202288495099904 |
---|---|
author | Daniel A Dalquen Adrian M Altenhoff Gaston H Gonnet Christophe Dessimoz |
author_facet | Daniel A Dalquen Adrian M Altenhoff Gaston H Gonnet Christophe Dessimoz |
author_sort | Daniel A Dalquen |
collection | DOAJ |
description | The identification of orthologous genes, a prerequisite for numerous analyses in comparative and functional genomics, is commonly performed computationally from protein sequences. Several previous studies have compared the accuracy of orthology inference methods, but simulated data has not typically been considered in cross-method assessment studies. Yet, while dependent on model assumptions, simulation-based benchmarking offers unique advantages: contrary to empirical data, all aspects of simulated data are known with certainty. Furthermore, the flexibility of simulation makes it possible to investigate performance factors in isolation of one another.Here, we use simulated data to dissect the performance of six methods for orthology inference available as standalone software packages (Inparanoid, OMA, OrthoInspector, OrthoMCL, QuartetS, SPIMAP) as well as two generic approaches (bidirectional best hit and reciprocal smallest distance). We investigate the impact of various evolutionary forces (gene duplication, insertion, deletion, and lateral gene transfer) and technological artefacts (ambiguous sequences) on orthology inference. We show that while gene duplication/loss and insertion/deletion are well handled by most methods (albeit for different trade-offs of precision and recall), lateral gene transfer disrupts all methods. As for ambiguous sequences, which might result from poor sequencing, assembly, or genome annotation, we show that they affect alignment score-based orthology methods more strongly than their distance-based counterparts. |
first_indexed | 2024-04-12T02:36:17Z |
format | Article |
id | doaj.art-3bca8898af154b70a6ae8c4a6a98c501 |
institution | Directory Open Access Journal |
issn | 1932-6203 |
language | English |
last_indexed | 2024-04-12T02:36:17Z |
publishDate | 2013-01-01 |
publisher | Public Library of Science (PLoS) |
record_format | Article |
series | PLoS ONE |
spelling | doaj.art-3bca8898af154b70a6ae8c4a6a98c5012022-12-22T03:51:31ZengPublic Library of Science (PLoS)PLoS ONE1932-62032013-01-0182e5692510.1371/journal.pone.0056925The impact of gene duplication, insertion, deletion, lateral gene transfer and sequencing error on orthology inference: a simulation study.Daniel A DalquenAdrian M AltenhoffGaston H GonnetChristophe DessimozThe identification of orthologous genes, a prerequisite for numerous analyses in comparative and functional genomics, is commonly performed computationally from protein sequences. Several previous studies have compared the accuracy of orthology inference methods, but simulated data has not typically been considered in cross-method assessment studies. Yet, while dependent on model assumptions, simulation-based benchmarking offers unique advantages: contrary to empirical data, all aspects of simulated data are known with certainty. Furthermore, the flexibility of simulation makes it possible to investigate performance factors in isolation of one another.Here, we use simulated data to dissect the performance of six methods for orthology inference available as standalone software packages (Inparanoid, OMA, OrthoInspector, OrthoMCL, QuartetS, SPIMAP) as well as two generic approaches (bidirectional best hit and reciprocal smallest distance). We investigate the impact of various evolutionary forces (gene duplication, insertion, deletion, and lateral gene transfer) and technological artefacts (ambiguous sequences) on orthology inference. We show that while gene duplication/loss and insertion/deletion are well handled by most methods (albeit for different trade-offs of precision and recall), lateral gene transfer disrupts all methods. As for ambiguous sequences, which might result from poor sequencing, assembly, or genome annotation, we show that they affect alignment score-based orthology methods more strongly than their distance-based counterparts.http://europepmc.org/articles/PMC3581572?pdf=render |
spellingShingle | Daniel A Dalquen Adrian M Altenhoff Gaston H Gonnet Christophe Dessimoz The impact of gene duplication, insertion, deletion, lateral gene transfer and sequencing error on orthology inference: a simulation study. PLoS ONE |
title | The impact of gene duplication, insertion, deletion, lateral gene transfer and sequencing error on orthology inference: a simulation study. |
title_full | The impact of gene duplication, insertion, deletion, lateral gene transfer and sequencing error on orthology inference: a simulation study. |
title_fullStr | The impact of gene duplication, insertion, deletion, lateral gene transfer and sequencing error on orthology inference: a simulation study. |
title_full_unstemmed | The impact of gene duplication, insertion, deletion, lateral gene transfer and sequencing error on orthology inference: a simulation study. |
title_short | The impact of gene duplication, insertion, deletion, lateral gene transfer and sequencing error on orthology inference: a simulation study. |
title_sort | impact of gene duplication insertion deletion lateral gene transfer and sequencing error on orthology inference a simulation study |
url | http://europepmc.org/articles/PMC3581572?pdf=render |
work_keys_str_mv | AT danieladalquen theimpactofgeneduplicationinsertiondeletionlateralgenetransferandsequencingerroronorthologyinferenceasimulationstudy AT adrianmaltenhoff theimpactofgeneduplicationinsertiondeletionlateralgenetransferandsequencingerroronorthologyinferenceasimulationstudy AT gastonhgonnet theimpactofgeneduplicationinsertiondeletionlateralgenetransferandsequencingerroronorthologyinferenceasimulationstudy AT christophedessimoz theimpactofgeneduplicationinsertiondeletionlateralgenetransferandsequencingerroronorthologyinferenceasimulationstudy AT danieladalquen impactofgeneduplicationinsertiondeletionlateralgenetransferandsequencingerroronorthologyinferenceasimulationstudy AT adrianmaltenhoff impactofgeneduplicationinsertiondeletionlateralgenetransferandsequencingerroronorthologyinferenceasimulationstudy AT gastonhgonnet impactofgeneduplicationinsertiondeletionlateralgenetransferandsequencingerroronorthologyinferenceasimulationstudy AT christophedessimoz impactofgeneduplicationinsertiondeletionlateralgenetransferandsequencingerroronorthologyinferenceasimulationstudy |