A comprehensive benchmark of graph-based genetic variant genotyping algorithms on plant genomes for creating an accurate ensemble pipeline

Abstract Background Although sequencing technologies have boosted the measurement of the genomic diversity of plant crops, it remains challenging to accurately genotype millions of genetic variants, especially structural variations, with only short reads. In recent years, many graph-based variation...

Full description

Bibliographic Details
Main Authors: Ze-Zhen Du, Jia-Bao He, Wen-Biao Jiao
Format: Article
Language:English
Published: BMC 2024-04-01
Series:Genome Biology
Subjects:
Online Access:https://doi.org/10.1186/s13059-024-03239-1
_version_ 1797209295605989376
author Ze-Zhen Du
Jia-Bao He
Wen-Biao Jiao
author_facet Ze-Zhen Du
Jia-Bao He
Wen-Biao Jiao
author_sort Ze-Zhen Du
collection DOAJ
description Abstract Background Although sequencing technologies have boosted the measurement of the genomic diversity of plant crops, it remains challenging to accurately genotype millions of genetic variants, especially structural variations, with only short reads. In recent years, many graph-based variation genotyping methods have been developed to address this issue and tested for human genomes. However, their performance in plant genomes remains largely elusive. Furthermore, pipelines integrating the advantages of current genotyping methods might be required, considering the different complexity of plant genomes. Results Here we comprehensively evaluate eight such genotypers in different scenarios in terms of variant type and size, sequencing parameters, genomic context, and complexity, as well as graph size, using both simulated and real data sets from representative plant genomes. Our evaluation reveals that there are still great challenges to applying existing methods to plants, such as excessive repeats and variants or high resource consumption. Therefore, we propose a pipeline called Ensemble Variant Genotyper (EVG) that can achieve better genotyping performance in almost all experimental scenarios and comparably higher genotyping recall and precision even using 5× reads. Furthermore, we demonstrate that EVG is more robust with an increasing number of graphed genomes, especially for insertions and deletions. Conclusions Our study will provide new insights into the development and application of graph-based genotyping algorithms. We conclude that EVG provides an accurate, unbiased, and cost-effective way for genotyping both small and large variations and will be potentially used in population-scale genotyping for large, repetitive, and heterozygous plant genomes.
first_indexed 2024-04-24T09:52:26Z
format Article
id doaj.art-9ffb5b1deabd444485d6e01a4171ae05
institution Directory Open Access Journal
issn 1474-760X
language English
last_indexed 2024-04-24T09:52:26Z
publishDate 2024-04-01
publisher BMC
record_format Article
series Genome Biology
spelling doaj.art-9ffb5b1deabd444485d6e01a4171ae052024-04-14T11:18:01ZengBMCGenome Biology1474-760X2024-04-0125112410.1186/s13059-024-03239-1A comprehensive benchmark of graph-based genetic variant genotyping algorithms on plant genomes for creating an accurate ensemble pipelineZe-Zhen Du0Jia-Bao He1Wen-Biao Jiao2National Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, Huazhong Agricultural UniversityNational Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, Huazhong Agricultural UniversityNational Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, Huazhong Agricultural UniversityAbstract Background Although sequencing technologies have boosted the measurement of the genomic diversity of plant crops, it remains challenging to accurately genotype millions of genetic variants, especially structural variations, with only short reads. In recent years, many graph-based variation genotyping methods have been developed to address this issue and tested for human genomes. However, their performance in plant genomes remains largely elusive. Furthermore, pipelines integrating the advantages of current genotyping methods might be required, considering the different complexity of plant genomes. Results Here we comprehensively evaluate eight such genotypers in different scenarios in terms of variant type and size, sequencing parameters, genomic context, and complexity, as well as graph size, using both simulated and real data sets from representative plant genomes. Our evaluation reveals that there are still great challenges to applying existing methods to plants, such as excessive repeats and variants or high resource consumption. Therefore, we propose a pipeline called Ensemble Variant Genotyper (EVG) that can achieve better genotyping performance in almost all experimental scenarios and comparably higher genotyping recall and precision even using 5× reads. Furthermore, we demonstrate that EVG is more robust with an increasing number of graphed genomes, especially for insertions and deletions. Conclusions Our study will provide new insights into the development and application of graph-based genotyping algorithms. We conclude that EVG provides an accurate, unbiased, and cost-effective way for genotyping both small and large variations and will be potentially used in population-scale genotyping for large, repetitive, and heterozygous plant genomes.https://doi.org/10.1186/s13059-024-03239-1Genome graphPlant genomesGenotypingStructural variationBenchmarking
spellingShingle Ze-Zhen Du
Jia-Bao He
Wen-Biao Jiao
A comprehensive benchmark of graph-based genetic variant genotyping algorithms on plant genomes for creating an accurate ensemble pipeline
Genome Biology
Genome graph
Plant genomes
Genotyping
Structural variation
Benchmarking
title A comprehensive benchmark of graph-based genetic variant genotyping algorithms on plant genomes for creating an accurate ensemble pipeline
title_full A comprehensive benchmark of graph-based genetic variant genotyping algorithms on plant genomes for creating an accurate ensemble pipeline
title_fullStr A comprehensive benchmark of graph-based genetic variant genotyping algorithms on plant genomes for creating an accurate ensemble pipeline
title_full_unstemmed A comprehensive benchmark of graph-based genetic variant genotyping algorithms on plant genomes for creating an accurate ensemble pipeline
title_short A comprehensive benchmark of graph-based genetic variant genotyping algorithms on plant genomes for creating an accurate ensemble pipeline
title_sort comprehensive benchmark of graph based genetic variant genotyping algorithms on plant genomes for creating an accurate ensemble pipeline
topic Genome graph
Plant genomes
Genotyping
Structural variation
Benchmarking
url https://doi.org/10.1186/s13059-024-03239-1
work_keys_str_mv AT zezhendu acomprehensivebenchmarkofgraphbasedgeneticvariantgenotypingalgorithmsonplantgenomesforcreatinganaccurateensemblepipeline
AT jiabaohe acomprehensivebenchmarkofgraphbasedgeneticvariantgenotypingalgorithmsonplantgenomesforcreatinganaccurateensemblepipeline
AT wenbiaojiao acomprehensivebenchmarkofgraphbasedgeneticvariantgenotypingalgorithmsonplantgenomesforcreatinganaccurateensemblepipeline
AT zezhendu comprehensivebenchmarkofgraphbasedgeneticvariantgenotypingalgorithmsonplantgenomesforcreatinganaccurateensemblepipeline
AT jiabaohe comprehensivebenchmarkofgraphbasedgeneticvariantgenotypingalgorithmsonplantgenomesforcreatinganaccurateensemblepipeline
AT wenbiaojiao comprehensivebenchmarkofgraphbasedgeneticvariantgenotypingalgorithmsonplantgenomesforcreatinganaccurateensemblepipeline