VarSCAT: A computational tool for sequence context annotations of genomic variants.

The sequence contexts of genomic variants play important roles in understanding biological significances of variants and potential sequencing related variant calling issues. However, methods for assessing the diverse sequence contexts of genomic variants such as tandem repeats and unambiguous annota...

Full description

Bibliographic Details
Main Authors: Ning Wang, Sofia Khan, Laura L Elo
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2023-08-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1010727
_version_ 1797674062789476352
author Ning Wang
Sofia Khan
Laura L Elo
author_facet Ning Wang
Sofia Khan
Laura L Elo
author_sort Ning Wang
collection DOAJ
description The sequence contexts of genomic variants play important roles in understanding biological significances of variants and potential sequencing related variant calling issues. However, methods for assessing the diverse sequence contexts of genomic variants such as tandem repeats and unambiguous annotations have been limited. Herein, we describe the Variant Sequence Context Annotation Tool (VarSCAT) for annotating the sequence contexts of genomic variants, including breakpoint ambiguities, flanking bases of variants, wildtype/mutated DNA sequences, variant nomenclatures, distances between adjacent variants, tandem repeat regions, and custom annotation with user customizable options. Our analyses demonstrate that VarSCAT is more versatile and customizable than the currently available methods or strategies for annotating variants in short tandem repeat (STR) regions or insertions and deletions (indels) with breakpoint ambiguity. Variant sequence context annotations of high-confidence human variant sets with VarSCAT revealed that more than 75% of all human individual germline and clinically relevant indels have breakpoint ambiguities. Moreover, we illustrate that more than 80% of human individual germline small variants in STR regions are indels and that the sizes of these indels correlated with STR motif sizes. VarSCAT is available from https://github.com/elolab/VarSCAT.
first_indexed 2024-03-11T21:53:39Z
format Article
id doaj.art-5c21fbe6370b43129193614cdb7f52fc
institution Directory Open Access Journal
issn 1553-734X
1553-7358
language English
last_indexed 2024-03-11T21:53:39Z
publishDate 2023-08-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Computational Biology
spelling doaj.art-5c21fbe6370b43129193614cdb7f52fc2023-09-26T05:30:56ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582023-08-01198e101072710.1371/journal.pcbi.1010727VarSCAT: A computational tool for sequence context annotations of genomic variants.Ning WangSofia KhanLaura L EloThe sequence contexts of genomic variants play important roles in understanding biological significances of variants and potential sequencing related variant calling issues. However, methods for assessing the diverse sequence contexts of genomic variants such as tandem repeats and unambiguous annotations have been limited. Herein, we describe the Variant Sequence Context Annotation Tool (VarSCAT) for annotating the sequence contexts of genomic variants, including breakpoint ambiguities, flanking bases of variants, wildtype/mutated DNA sequences, variant nomenclatures, distances between adjacent variants, tandem repeat regions, and custom annotation with user customizable options. Our analyses demonstrate that VarSCAT is more versatile and customizable than the currently available methods or strategies for annotating variants in short tandem repeat (STR) regions or insertions and deletions (indels) with breakpoint ambiguity. Variant sequence context annotations of high-confidence human variant sets with VarSCAT revealed that more than 75% of all human individual germline and clinically relevant indels have breakpoint ambiguities. Moreover, we illustrate that more than 80% of human individual germline small variants in STR regions are indels and that the sizes of these indels correlated with STR motif sizes. VarSCAT is available from https://github.com/elolab/VarSCAT.https://doi.org/10.1371/journal.pcbi.1010727
spellingShingle Ning Wang
Sofia Khan
Laura L Elo
VarSCAT: A computational tool for sequence context annotations of genomic variants.
PLoS Computational Biology
title VarSCAT: A computational tool for sequence context annotations of genomic variants.
title_full VarSCAT: A computational tool for sequence context annotations of genomic variants.
title_fullStr VarSCAT: A computational tool for sequence context annotations of genomic variants.
title_full_unstemmed VarSCAT: A computational tool for sequence context annotations of genomic variants.
title_short VarSCAT: A computational tool for sequence context annotations of genomic variants.
title_sort varscat a computational tool for sequence context annotations of genomic variants
url https://doi.org/10.1371/journal.pcbi.1010727
work_keys_str_mv AT ningwang varscatacomputationaltoolforsequencecontextannotationsofgenomicvariants
AT sofiakhan varscatacomputationaltoolforsequencecontextannotationsofgenomicvariants
AT lauralelo varscatacomputationaltoolforsequencecontextannotationsofgenomicvariants