A comprehensive benchmark of graph-based genetic variant genotyping algorithms on plant genomes for creating an accurate ensemble pipeline
Abstract Background Although sequencing technologies have boosted the measurement of the genomic diversity of plant crops, it remains challenging to accurately genotype millions of genetic variants, especially structural variations, with only short reads. In recent years, many graph-based variation...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2024-04-01
|
Series: | Genome Biology |
Subjects: | |
Online Access: | https://doi.org/10.1186/s13059-024-03239-1 |
_version_ | 1797209295605989376 |
---|---|
author | Ze-Zhen Du Jia-Bao He Wen-Biao Jiao |
author_facet | Ze-Zhen Du Jia-Bao He Wen-Biao Jiao |
author_sort | Ze-Zhen Du |
collection | DOAJ |
description | Abstract Background Although sequencing technologies have boosted the measurement of the genomic diversity of plant crops, it remains challenging to accurately genotype millions of genetic variants, especially structural variations, with only short reads. In recent years, many graph-based variation genotyping methods have been developed to address this issue and tested for human genomes. However, their performance in plant genomes remains largely elusive. Furthermore, pipelines integrating the advantages of current genotyping methods might be required, considering the different complexity of plant genomes. Results Here we comprehensively evaluate eight such genotypers in different scenarios in terms of variant type and size, sequencing parameters, genomic context, and complexity, as well as graph size, using both simulated and real data sets from representative plant genomes. Our evaluation reveals that there are still great challenges to applying existing methods to plants, such as excessive repeats and variants or high resource consumption. Therefore, we propose a pipeline called Ensemble Variant Genotyper (EVG) that can achieve better genotyping performance in almost all experimental scenarios and comparably higher genotyping recall and precision even using 5× reads. Furthermore, we demonstrate that EVG is more robust with an increasing number of graphed genomes, especially for insertions and deletions. Conclusions Our study will provide new insights into the development and application of graph-based genotyping algorithms. We conclude that EVG provides an accurate, unbiased, and cost-effective way for genotyping both small and large variations and will be potentially used in population-scale genotyping for large, repetitive, and heterozygous plant genomes. |
first_indexed | 2024-04-24T09:52:26Z |
format | Article |
id | doaj.art-9ffb5b1deabd444485d6e01a4171ae05 |
institution | Directory Open Access Journal |
issn | 1474-760X |
language | English |
last_indexed | 2024-04-24T09:52:26Z |
publishDate | 2024-04-01 |
publisher | BMC |
record_format | Article |
series | Genome Biology |
spelling | doaj.art-9ffb5b1deabd444485d6e01a4171ae052024-04-14T11:18:01ZengBMCGenome Biology1474-760X2024-04-0125112410.1186/s13059-024-03239-1A comprehensive benchmark of graph-based genetic variant genotyping algorithms on plant genomes for creating an accurate ensemble pipelineZe-Zhen Du0Jia-Bao He1Wen-Biao Jiao2National Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, Huazhong Agricultural UniversityNational Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, Huazhong Agricultural UniversityNational Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, Huazhong Agricultural UniversityAbstract Background Although sequencing technologies have boosted the measurement of the genomic diversity of plant crops, it remains challenging to accurately genotype millions of genetic variants, especially structural variations, with only short reads. In recent years, many graph-based variation genotyping methods have been developed to address this issue and tested for human genomes. However, their performance in plant genomes remains largely elusive. Furthermore, pipelines integrating the advantages of current genotyping methods might be required, considering the different complexity of plant genomes. Results Here we comprehensively evaluate eight such genotypers in different scenarios in terms of variant type and size, sequencing parameters, genomic context, and complexity, as well as graph size, using both simulated and real data sets from representative plant genomes. Our evaluation reveals that there are still great challenges to applying existing methods to plants, such as excessive repeats and variants or high resource consumption. Therefore, we propose a pipeline called Ensemble Variant Genotyper (EVG) that can achieve better genotyping performance in almost all experimental scenarios and comparably higher genotyping recall and precision even using 5× reads. Furthermore, we demonstrate that EVG is more robust with an increasing number of graphed genomes, especially for insertions and deletions. Conclusions Our study will provide new insights into the development and application of graph-based genotyping algorithms. We conclude that EVG provides an accurate, unbiased, and cost-effective way for genotyping both small and large variations and will be potentially used in population-scale genotyping for large, repetitive, and heterozygous plant genomes.https://doi.org/10.1186/s13059-024-03239-1Genome graphPlant genomesGenotypingStructural variationBenchmarking |
spellingShingle | Ze-Zhen Du Jia-Bao He Wen-Biao Jiao A comprehensive benchmark of graph-based genetic variant genotyping algorithms on plant genomes for creating an accurate ensemble pipeline Genome Biology Genome graph Plant genomes Genotyping Structural variation Benchmarking |
title | A comprehensive benchmark of graph-based genetic variant genotyping algorithms on plant genomes for creating an accurate ensemble pipeline |
title_full | A comprehensive benchmark of graph-based genetic variant genotyping algorithms on plant genomes for creating an accurate ensemble pipeline |
title_fullStr | A comprehensive benchmark of graph-based genetic variant genotyping algorithms on plant genomes for creating an accurate ensemble pipeline |
title_full_unstemmed | A comprehensive benchmark of graph-based genetic variant genotyping algorithms on plant genomes for creating an accurate ensemble pipeline |
title_short | A comprehensive benchmark of graph-based genetic variant genotyping algorithms on plant genomes for creating an accurate ensemble pipeline |
title_sort | comprehensive benchmark of graph based genetic variant genotyping algorithms on plant genomes for creating an accurate ensemble pipeline |
topic | Genome graph Plant genomes Genotyping Structural variation Benchmarking |
url | https://doi.org/10.1186/s13059-024-03239-1 |
work_keys_str_mv | AT zezhendu acomprehensivebenchmarkofgraphbasedgeneticvariantgenotypingalgorithmsonplantgenomesforcreatinganaccurateensemblepipeline AT jiabaohe acomprehensivebenchmarkofgraphbasedgeneticvariantgenotypingalgorithmsonplantgenomesforcreatinganaccurateensemblepipeline AT wenbiaojiao acomprehensivebenchmarkofgraphbasedgeneticvariantgenotypingalgorithmsonplantgenomesforcreatinganaccurateensemblepipeline AT zezhendu comprehensivebenchmarkofgraphbasedgeneticvariantgenotypingalgorithmsonplantgenomesforcreatinganaccurateensemblepipeline AT jiabaohe comprehensivebenchmarkofgraphbasedgeneticvariantgenotypingalgorithmsonplantgenomesforcreatinganaccurateensemblepipeline AT wenbiaojiao comprehensivebenchmarkofgraphbasedgeneticvariantgenotypingalgorithmsonplantgenomesforcreatinganaccurateensemblepipeline |