vamos: variable-number tandem repeats annotation using efficient motif sets

Abstract Roughly 3% of the human genome is composed of variable-number tandem repeats (VNTRs): arrays of motifs at least six bases. These loci are highly polymorphic, yet current approaches that define and merge variants based on alignment breakpoints do not capture their full diversity. Here we pre...

Full description

Bibliographic Details
Main Authors: Jingwen Ren, Bida Gu, Mark J. P. Chaisson
Format: Article
Language:English
Published: BMC 2023-07-01
Series:Genome Biology
Subjects:
Online Access:https://doi.org/10.1186/s13059-023-03010-y
Description
Summary:Abstract Roughly 3% of the human genome is composed of variable-number tandem repeats (VNTRs): arrays of motifs at least six bases. These loci are highly polymorphic, yet current approaches that define and merge variants based on alignment breakpoints do not capture their full diversity. Here we present a method vamos: VNTR Annotation using efficient Motif Sets that instead annotates VNTR using repeat composition under different levels of motif diversity. Using vamos we estimate 7.4–16.7 alleles per locus when applied to 74 haplotype-resolved human assemblies, compared to breakpoint-based approaches that estimate 4.0–5.5 alleles per locus.
ISSN:1474-760X