Kssd: sequence dimensionality reduction by k-mer substring space sampling enables real-time large-scale datasets analysis

Abstract Here, we develop k -mer substring space decomposition (Kssd), a sketching technique which is significantly faster and more accurate than current sketching methods. We show that it is the only method that can be used for large-scale dataset comparisons at population resolution on simulated a...

Full description

Bibliographic Details
Main Authors: Huiguang Yi, Yanling Lin, Chengqi Lin, Wenfei Jin
Format: Article
Language:English
Published: BMC 2021-03-01
Series:Genome Biology
Subjects:
Online Access:https://doi.org/10.1186/s13059-021-02303-4