Quality score compression improves genotyping accuracy

To the Editor: Most next-generation sequencing (NGS) quality scores are space intensive, redundant and often misleading. In this Correspondence, we recover quality information directly from sequence data using a compression tool named Quartz, rendering such scores redundant and yielding substantial...

Full description

Bibliographic Details
Main Authors: Yu, Yun William, Yorukoglu, Deniz, Peng, Jian, Berger Leighton, Bonnie
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format: Article
Language:en_US
Published: Springer Nature 2016
Online Access:http://hdl.handle.net/1721.1/104079
https://orcid.org/0000-0002-8275-9576
https://orcid.org/0000-0003-2315-0768
https://orcid.org/0000-0002-2724-7228
_version_ 1811087187955941376
author Yu, Yun William
Yorukoglu, Deniz
Peng, Jian
Berger Leighton, Bonnie
author2 Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Yu, Yun William
Yorukoglu, Deniz
Peng, Jian
Berger Leighton, Bonnie
author_sort Yu, Yun William
collection MIT
description To the Editor: Most next-generation sequencing (NGS) quality scores are space intensive, redundant and often misleading. In this Correspondence, we recover quality information directly from sequence data using a compression tool named Quartz, rendering such scores redundant and yielding substantially better space and time efficiencies for storage and analysis. Quartz is designed to operate on NGS reads in FASTQ format, but it can be trivially modified to discard quality scores in other formats for which scores are paired with sequence information. Discarding 95% of quality scores resulted, counterintuitively, in improved SNP calling, implying that compression need not come at the expense of accuracy.
first_indexed 2024-09-23T13:41:23Z
format Article
id mit-1721.1/104079
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T13:41:23Z
publishDate 2016
publisher Springer Nature
record_format dspace
spelling mit-1721.1/1040792022-09-28T15:33:36Z Quality score compression improves genotyping accuracy Yu, Yun William Yorukoglu, Deniz Peng, Jian Berger Leighton, Bonnie Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology. Department of Mathematics Yu, Yun William Yorukoglu, Deniz Peng, Jian Berger Leighton, Bonnie To the Editor: Most next-generation sequencing (NGS) quality scores are space intensive, redundant and often misleading. In this Correspondence, we recover quality information directly from sequence data using a compression tool named Quartz, rendering such scores redundant and yielding substantially better space and time efficiencies for storage and analysis. Quartz is designed to operate on NGS reads in FASTQ format, but it can be trivially modified to discard quality scores in other formats for which scores are paired with sequence information. Discarding 95% of quality scores resulted, counterintuitively, in improved SNP calling, implying that compression need not come at the expense of accuracy. Hertz Foundation National Institutes of Health (U.S.) (NIH grant GM108348) 2016-08-30T21:02:51Z 2016-08-30T21:02:51Z 2015-03 Article http://purl.org/eprint/type/JournalArticle 1087-0156 1546-1696 http://hdl.handle.net/1721.1/104079 Yu, Y William, Deniz Yorukoglu, Jian Peng, and Bonnie Berger. “Quality Score Compression Improves Genotyping Accuracy.” Nature Biotechnology 33, no. 3 (March 6, 2015): 240–243. https://orcid.org/0000-0002-8275-9576 https://orcid.org/0000-0003-2315-0768 https://orcid.org/0000-0002-2724-7228 en_US http://dx.doi.org/10.1038/nbt.3170 Nature Biotechnology Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Springer Nature PMC
spellingShingle Yu, Yun William
Yorukoglu, Deniz
Peng, Jian
Berger Leighton, Bonnie
Quality score compression improves genotyping accuracy
title Quality score compression improves genotyping accuracy
title_full Quality score compression improves genotyping accuracy
title_fullStr Quality score compression improves genotyping accuracy
title_full_unstemmed Quality score compression improves genotyping accuracy
title_short Quality score compression improves genotyping accuracy
title_sort quality score compression improves genotyping accuracy
url http://hdl.handle.net/1721.1/104079
https://orcid.org/0000-0002-8275-9576
https://orcid.org/0000-0003-2315-0768
https://orcid.org/0000-0002-2724-7228
work_keys_str_mv AT yuyunwilliam qualityscorecompressionimprovesgenotypingaccuracy
AT yorukogludeniz qualityscorecompressionimprovesgenotypingaccuracy
AT pengjian qualityscorecompressionimprovesgenotypingaccuracy
AT bergerleightonbonnie qualityscorecompressionimprovesgenotypingaccuracy