The impact of gene duplication, insertion, deletion, lateral gene transfer and sequencing error on orthology inference: a simulation study.

The identification of orthologous genes, a prerequisite for numerous analyses in comparative and functional genomics, is commonly performed computationally from protein sequences. Several previous studies have compared the accuracy of orthology inference methods, but simulated data has not typically...

Full description

Bibliographic Details
Main Authors: Daniel A Dalquen, Adrian M Altenhoff, Gaston H Gonnet, Christophe Dessimoz
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2013-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC3581572?pdf=render
_version_ 1811202288495099904
author Daniel A Dalquen
Adrian M Altenhoff
Gaston H Gonnet
Christophe Dessimoz
author_facet Daniel A Dalquen
Adrian M Altenhoff
Gaston H Gonnet
Christophe Dessimoz
author_sort Daniel A Dalquen
collection DOAJ
description The identification of orthologous genes, a prerequisite for numerous analyses in comparative and functional genomics, is commonly performed computationally from protein sequences. Several previous studies have compared the accuracy of orthology inference methods, but simulated data has not typically been considered in cross-method assessment studies. Yet, while dependent on model assumptions, simulation-based benchmarking offers unique advantages: contrary to empirical data, all aspects of simulated data are known with certainty. Furthermore, the flexibility of simulation makes it possible to investigate performance factors in isolation of one another.Here, we use simulated data to dissect the performance of six methods for orthology inference available as standalone software packages (Inparanoid, OMA, OrthoInspector, OrthoMCL, QuartetS, SPIMAP) as well as two generic approaches (bidirectional best hit and reciprocal smallest distance). We investigate the impact of various evolutionary forces (gene duplication, insertion, deletion, and lateral gene transfer) and technological artefacts (ambiguous sequences) on orthology inference. We show that while gene duplication/loss and insertion/deletion are well handled by most methods (albeit for different trade-offs of precision and recall), lateral gene transfer disrupts all methods. As for ambiguous sequences, which might result from poor sequencing, assembly, or genome annotation, we show that they affect alignment score-based orthology methods more strongly than their distance-based counterparts.
first_indexed 2024-04-12T02:36:17Z
format Article
id doaj.art-3bca8898af154b70a6ae8c4a6a98c501
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-04-12T02:36:17Z
publishDate 2013-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-3bca8898af154b70a6ae8c4a6a98c5012022-12-22T03:51:31ZengPublic Library of Science (PLoS)PLoS ONE1932-62032013-01-0182e5692510.1371/journal.pone.0056925The impact of gene duplication, insertion, deletion, lateral gene transfer and sequencing error on orthology inference: a simulation study.Daniel A DalquenAdrian M AltenhoffGaston H GonnetChristophe DessimozThe identification of orthologous genes, a prerequisite for numerous analyses in comparative and functional genomics, is commonly performed computationally from protein sequences. Several previous studies have compared the accuracy of orthology inference methods, but simulated data has not typically been considered in cross-method assessment studies. Yet, while dependent on model assumptions, simulation-based benchmarking offers unique advantages: contrary to empirical data, all aspects of simulated data are known with certainty. Furthermore, the flexibility of simulation makes it possible to investigate performance factors in isolation of one another.Here, we use simulated data to dissect the performance of six methods for orthology inference available as standalone software packages (Inparanoid, OMA, OrthoInspector, OrthoMCL, QuartetS, SPIMAP) as well as two generic approaches (bidirectional best hit and reciprocal smallest distance). We investigate the impact of various evolutionary forces (gene duplication, insertion, deletion, and lateral gene transfer) and technological artefacts (ambiguous sequences) on orthology inference. We show that while gene duplication/loss and insertion/deletion are well handled by most methods (albeit for different trade-offs of precision and recall), lateral gene transfer disrupts all methods. As for ambiguous sequences, which might result from poor sequencing, assembly, or genome annotation, we show that they affect alignment score-based orthology methods more strongly than their distance-based counterparts.http://europepmc.org/articles/PMC3581572?pdf=render
spellingShingle Daniel A Dalquen
Adrian M Altenhoff
Gaston H Gonnet
Christophe Dessimoz
The impact of gene duplication, insertion, deletion, lateral gene transfer and sequencing error on orthology inference: a simulation study.
PLoS ONE
title The impact of gene duplication, insertion, deletion, lateral gene transfer and sequencing error on orthology inference: a simulation study.
title_full The impact of gene duplication, insertion, deletion, lateral gene transfer and sequencing error on orthology inference: a simulation study.
title_fullStr The impact of gene duplication, insertion, deletion, lateral gene transfer and sequencing error on orthology inference: a simulation study.
title_full_unstemmed The impact of gene duplication, insertion, deletion, lateral gene transfer and sequencing error on orthology inference: a simulation study.
title_short The impact of gene duplication, insertion, deletion, lateral gene transfer and sequencing error on orthology inference: a simulation study.
title_sort impact of gene duplication insertion deletion lateral gene transfer and sequencing error on orthology inference a simulation study
url http://europepmc.org/articles/PMC3581572?pdf=render
work_keys_str_mv AT danieladalquen theimpactofgeneduplicationinsertiondeletionlateralgenetransferandsequencingerroronorthologyinferenceasimulationstudy
AT adrianmaltenhoff theimpactofgeneduplicationinsertiondeletionlateralgenetransferandsequencingerroronorthologyinferenceasimulationstudy
AT gastonhgonnet theimpactofgeneduplicationinsertiondeletionlateralgenetransferandsequencingerroronorthologyinferenceasimulationstudy
AT christophedessimoz theimpactofgeneduplicationinsertiondeletionlateralgenetransferandsequencingerroronorthologyinferenceasimulationstudy
AT danieladalquen impactofgeneduplicationinsertiondeletionlateralgenetransferandsequencingerroronorthologyinferenceasimulationstudy
AT adrianmaltenhoff impactofgeneduplicationinsertiondeletionlateralgenetransferandsequencingerroronorthologyinferenceasimulationstudy
AT gastonhgonnet impactofgeneduplicationinsertiondeletionlateralgenetransferandsequencingerroronorthologyinferenceasimulationstudy
AT christophedessimoz impactofgeneduplicationinsertiondeletionlateralgenetransferandsequencingerroronorthologyinferenceasimulationstudy