Quality score compression improves genotyping accuracy
To the Editor: Most next-generation sequencing (NGS) quality scores are space intensive, redundant and often misleading. In this Correspondence, we recover quality information directly from sequence data using a compression tool named Quartz, rendering such scores redundant and yielding substantial...
Main Authors: | , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | en_US |
Published: |
Springer Nature
2016
|
Online Access: | http://hdl.handle.net/1721.1/104079 https://orcid.org/0000-0002-8275-9576 https://orcid.org/0000-0003-2315-0768 https://orcid.org/0000-0002-2724-7228 |
_version_ | 1811087187955941376 |
---|---|
author | Yu, Yun William Yorukoglu, Deniz Peng, Jian Berger Leighton, Bonnie |
author2 | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory |
author_facet | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Yu, Yun William Yorukoglu, Deniz Peng, Jian Berger Leighton, Bonnie |
author_sort | Yu, Yun William |
collection | MIT |
description | To the Editor:
Most next-generation sequencing (NGS) quality scores are space intensive, redundant and often misleading. In this Correspondence, we recover quality information directly from sequence data using a compression tool named Quartz, rendering such scores redundant and yielding substantially better space and time efficiencies for storage and analysis. Quartz is designed to operate on NGS reads in FASTQ format, but it can be trivially modified to discard quality scores in other formats for which scores are paired with sequence information. Discarding 95% of quality scores resulted, counterintuitively, in improved SNP calling, implying that compression need not come at the expense of accuracy. |
first_indexed | 2024-09-23T13:41:23Z |
format | Article |
id | mit-1721.1/104079 |
institution | Massachusetts Institute of Technology |
language | en_US |
last_indexed | 2024-09-23T13:41:23Z |
publishDate | 2016 |
publisher | Springer Nature |
record_format | dspace |
spelling | mit-1721.1/1040792022-09-28T15:33:36Z Quality score compression improves genotyping accuracy Yu, Yun William Yorukoglu, Deniz Peng, Jian Berger Leighton, Bonnie Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology. Department of Mathematics Yu, Yun William Yorukoglu, Deniz Peng, Jian Berger Leighton, Bonnie To the Editor: Most next-generation sequencing (NGS) quality scores are space intensive, redundant and often misleading. In this Correspondence, we recover quality information directly from sequence data using a compression tool named Quartz, rendering such scores redundant and yielding substantially better space and time efficiencies for storage and analysis. Quartz is designed to operate on NGS reads in FASTQ format, but it can be trivially modified to discard quality scores in other formats for which scores are paired with sequence information. Discarding 95% of quality scores resulted, counterintuitively, in improved SNP calling, implying that compression need not come at the expense of accuracy. Hertz Foundation National Institutes of Health (U.S.) (NIH grant GM108348) 2016-08-30T21:02:51Z 2016-08-30T21:02:51Z 2015-03 Article http://purl.org/eprint/type/JournalArticle 1087-0156 1546-1696 http://hdl.handle.net/1721.1/104079 Yu, Y William, Deniz Yorukoglu, Jian Peng, and Bonnie Berger. “Quality Score Compression Improves Genotyping Accuracy.” Nature Biotechnology 33, no. 3 (March 6, 2015): 240–243. https://orcid.org/0000-0002-8275-9576 https://orcid.org/0000-0003-2315-0768 https://orcid.org/0000-0002-2724-7228 en_US http://dx.doi.org/10.1038/nbt.3170 Nature Biotechnology Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Springer Nature PMC |
spellingShingle | Yu, Yun William Yorukoglu, Deniz Peng, Jian Berger Leighton, Bonnie Quality score compression improves genotyping accuracy |
title | Quality score compression improves genotyping accuracy |
title_full | Quality score compression improves genotyping accuracy |
title_fullStr | Quality score compression improves genotyping accuracy |
title_full_unstemmed | Quality score compression improves genotyping accuracy |
title_short | Quality score compression improves genotyping accuracy |
title_sort | quality score compression improves genotyping accuracy |
url | http://hdl.handle.net/1721.1/104079 https://orcid.org/0000-0002-8275-9576 https://orcid.org/0000-0003-2315-0768 https://orcid.org/0000-0002-2724-7228 |
work_keys_str_mv | AT yuyunwilliam qualityscorecompressionimprovesgenotypingaccuracy AT yorukogludeniz qualityscorecompressionimprovesgenotypingaccuracy AT pengjian qualityscorecompressionimprovesgenotypingaccuracy AT bergerleightonbonnie qualityscorecompressionimprovesgenotypingaccuracy |