Quality score compression improves genotyping accuracy

To the Editor: Most next-generation sequencing (NGS) quality scores are space intensive, redundant and often misleading. In this Correspondence, we recover quality information directly from sequence data using a compression tool named Quartz, rendering such scores redundant and yielding substantial...

Full description

Bibliographic Details
Main Authors: Yu, Yun William, Yorukoglu, Deniz, Peng, Jian, Berger Leighton, Bonnie
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format: Article
Language:en_US
Published: Springer Nature 2016
Online Access:http://hdl.handle.net/1721.1/104079
https://orcid.org/0000-0002-8275-9576
https://orcid.org/0000-0003-2315-0768
https://orcid.org/0000-0002-2724-7228
Description
Summary:To the Editor: Most next-generation sequencing (NGS) quality scores are space intensive, redundant and often misleading. In this Correspondence, we recover quality information directly from sequence data using a compression tool named Quartz, rendering such scores redundant and yielding substantially better space and time efficiencies for storage and analysis. Quartz is designed to operate on NGS reads in FASTQ format, but it can be trivially modified to discard quality scores in other formats for which scores are paired with sequence information. Discarding 95% of quality scores resulted, counterintuitively, in improved SNP calling, implying that compression need not come at the expense of accuracy.