Different structural variant prediction tools yield considerably different results in Caenorhabditis elegans.

The accurate characterization of structural variation is crucial for our understanding of how large chromosomal alterations affect phenotypic differences and contribute to genome evolution. Whole-genome sequencing is a popular approach for identifying structural variants, but the accuracy of popular...

Full description

Bibliographic Details
Main Authors: Kyle Lesack, Grace M Mariene, Erik C Andersen, James D Wasmuth
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2022-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0278424
_version_ 1797954058977280000
author Kyle Lesack
Grace M Mariene
Erik C Andersen
James D Wasmuth
author_facet Kyle Lesack
Grace M Mariene
Erik C Andersen
James D Wasmuth
author_sort Kyle Lesack
collection DOAJ
description The accurate characterization of structural variation is crucial for our understanding of how large chromosomal alterations affect phenotypic differences and contribute to genome evolution. Whole-genome sequencing is a popular approach for identifying structural variants, but the accuracy of popular tools remains unclear due to the limitations of existing benchmarks. Moreover, the performance of these tools for predicting variants in non-human genomes is less certain, as most tools were developed and benchmarked using data from the human genome. To evaluate the use of long-read data for the validation of short-read structural variant calls, the agreement between predictions from a short-read ensemble learning method and long-read tools were compared using real and simulated data from Caenorhabditis elegans. The results obtained from simulated data indicate that the best performing tool is contingent on the type and size of the variant, as well as the sequencing depth of coverage. These results also highlight the need for reference datasets generated from real data that can be used as 'ground truth' in benchmarks.
first_indexed 2024-04-10T23:11:38Z
format Article
id doaj.art-e1bbe9f985ec4cc0b9882ab26bc0ac55
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-04-10T23:11:38Z
publishDate 2022-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-e1bbe9f985ec4cc0b9882ab26bc0ac552023-01-13T05:31:27ZengPublic Library of Science (PLoS)PLoS ONE1932-62032022-01-011712e027842410.1371/journal.pone.0278424Different structural variant prediction tools yield considerably different results in Caenorhabditis elegans.Kyle LesackGrace M MarieneErik C AndersenJames D WasmuthThe accurate characterization of structural variation is crucial for our understanding of how large chromosomal alterations affect phenotypic differences and contribute to genome evolution. Whole-genome sequencing is a popular approach for identifying structural variants, but the accuracy of popular tools remains unclear due to the limitations of existing benchmarks. Moreover, the performance of these tools for predicting variants in non-human genomes is less certain, as most tools were developed and benchmarked using data from the human genome. To evaluate the use of long-read data for the validation of short-read structural variant calls, the agreement between predictions from a short-read ensemble learning method and long-read tools were compared using real and simulated data from Caenorhabditis elegans. The results obtained from simulated data indicate that the best performing tool is contingent on the type and size of the variant, as well as the sequencing depth of coverage. These results also highlight the need for reference datasets generated from real data that can be used as 'ground truth' in benchmarks.https://doi.org/10.1371/journal.pone.0278424
spellingShingle Kyle Lesack
Grace M Mariene
Erik C Andersen
James D Wasmuth
Different structural variant prediction tools yield considerably different results in Caenorhabditis elegans.
PLoS ONE
title Different structural variant prediction tools yield considerably different results in Caenorhabditis elegans.
title_full Different structural variant prediction tools yield considerably different results in Caenorhabditis elegans.
title_fullStr Different structural variant prediction tools yield considerably different results in Caenorhabditis elegans.
title_full_unstemmed Different structural variant prediction tools yield considerably different results in Caenorhabditis elegans.
title_short Different structural variant prediction tools yield considerably different results in Caenorhabditis elegans.
title_sort different structural variant prediction tools yield considerably different results in caenorhabditis elegans
url https://doi.org/10.1371/journal.pone.0278424
work_keys_str_mv AT kylelesack differentstructuralvariantpredictiontoolsyieldconsiderablydifferentresultsincaenorhabditiselegans
AT gracemmariene differentstructuralvariantpredictiontoolsyieldconsiderablydifferentresultsincaenorhabditiselegans
AT erikcandersen differentstructuralvariantpredictiontoolsyieldconsiderablydifferentresultsincaenorhabditiselegans
AT jamesdwasmuth differentstructuralvariantpredictiontoolsyieldconsiderablydifferentresultsincaenorhabditiselegans