Maximizing the potential of high-throughput next-generation sequencing through precise normalization based on read count distribution

ABSTRACT Next-generation sequencing technologies have enabled many advances across diverse areas of biology, with many benefiting from increased sample size. Although the cost of running next-generation sequencing instruments has dropped substantially over time, the cost of sample preparation method...

Full description

Bibliographic Details
Main Authors: Caitriona Brennan, Rodolfo A. Salido, Pedro Belda-Ferre, MacKenzie Bryant, Charles Cowart, Maria D. Tiu, Antonio González, Daniel McDonald, Caitlin Tribelhorn, Amir Zarrinpar, Rob Knight
Format: Article
Language:English
Published: American Society for Microbiology 2023-08-01
Series:mSystems
Subjects:
Online Access:https://journals.asm.org/doi/10.1128/msystems.00006-23
_version_ 1797730336200720384
author Caitriona Brennan
Rodolfo A. Salido
Pedro Belda-Ferre
MacKenzie Bryant
Charles Cowart
Maria D. Tiu
Antonio González
Daniel McDonald
Caitlin Tribelhorn
Amir Zarrinpar
Rob Knight
author_facet Caitriona Brennan
Rodolfo A. Salido
Pedro Belda-Ferre
MacKenzie Bryant
Charles Cowart
Maria D. Tiu
Antonio González
Daniel McDonald
Caitlin Tribelhorn
Amir Zarrinpar
Rob Knight
author_sort Caitriona Brennan
collection DOAJ
description ABSTRACT Next-generation sequencing technologies have enabled many advances across diverse areas of biology, with many benefiting from increased sample size. Although the cost of running next-generation sequencing instruments has dropped substantially over time, the cost of sample preparation methods has lagged behind. To counter this, researchers have adapted library miniaturization protocols and large sample pools to maximize the number of samples that can be prepared by a certain amount of reagents and sequenced in a single run. However, due to high variability of sample quality, over and underrepresentation of samples in a sequencing run has become a major issue in high-throughput sequencing. This leads to misinterpretation of results due to increased noise, and additional time and cost rerunning underrepresented samples. To overcome this problem, we present a normalization method that uses shallow iSeq sequencing to accurately inform pooling volumes based on read distribution. This method is superior to the widely used fluorometry methods, which cannot specifically target adapter-ligated molecules that contribute to sequencing output. Our normalization method not only quantifies adapter-ligated molecules but also allows normalization of feature space; for example, we can normalize to reads of interest such as non-ribosomal reads. As a result, this normalization method improves the efficiency of high-throughput next-generation sequencing by reducing noise and producing higher average reads per sample with more even sequencing depth. IMPORTANCE High-throughput next generation sequencing (NGS) has significantly contributed to the field of genomics; however, further improvements can maximize the potential of this important tool. Uneven sequencing of samples in a multiplexed run is a common issue that leads to unexpected extra costs or low-quality data. To mitigate this problem, we introduce a normalization method based on read counts rather than library concentration. This method allows for an even distribution of features of interest across samples, improving the statistical power of data sets and preventing the financial loss associated with resequencing libraries. This method optimizes NGS, which already has huge importance across many areas of biology.
first_indexed 2024-03-12T11:42:49Z
format Article
id doaj.art-ad7c5f3d3cf14b89bb7c978b9cc196e7
institution Directory Open Access Journal
issn 2379-5077
language English
last_indexed 2024-03-12T11:42:49Z
publishDate 2023-08-01
publisher American Society for Microbiology
record_format Article
series mSystems
spelling doaj.art-ad7c5f3d3cf14b89bb7c978b9cc196e72023-08-31T13:00:43ZengAmerican Society for MicrobiologymSystems2379-50772023-08-018410.1128/msystems.00006-23Maximizing the potential of high-throughput next-generation sequencing through precise normalization based on read count distributionCaitriona Brennan0Rodolfo A. Salido1Pedro Belda-Ferre2MacKenzie Bryant3Charles Cowart4Maria D. Tiu5Antonio González6Daniel McDonald7Caitlin Tribelhorn8Amir Zarrinpar9Rob Knight10Department of Pediatrics, University of California San Diego , La Jolla, California, USADepartment of Bioengineering, University of California San Diego , La Jolla, California, USADepartment of Pediatrics, University of California San Diego , La Jolla, California, USADepartment of Pediatrics, University of California San Diego , La Jolla, California, USADepartment of Pediatrics, University of California San Diego , La Jolla, California, USADivision of Gastroenterology, University of California San Diego , La Jolla, California, USADepartment of Pediatrics, University of California San Diego , La Jolla, California, USADepartment of Pediatrics, University of California San Diego , La Jolla, California, USADepartment of Pediatrics, University of California San Diego , La Jolla, California, USADivision of Gastroenterology, University of California San Diego , La Jolla, California, USADepartment of Pediatrics, University of California San Diego , La Jolla, California, USAABSTRACT Next-generation sequencing technologies have enabled many advances across diverse areas of biology, with many benefiting from increased sample size. Although the cost of running next-generation sequencing instruments has dropped substantially over time, the cost of sample preparation methods has lagged behind. To counter this, researchers have adapted library miniaturization protocols and large sample pools to maximize the number of samples that can be prepared by a certain amount of reagents and sequenced in a single run. However, due to high variability of sample quality, over and underrepresentation of samples in a sequencing run has become a major issue in high-throughput sequencing. This leads to misinterpretation of results due to increased noise, and additional time and cost rerunning underrepresented samples. To overcome this problem, we present a normalization method that uses shallow iSeq sequencing to accurately inform pooling volumes based on read distribution. This method is superior to the widely used fluorometry methods, which cannot specifically target adapter-ligated molecules that contribute to sequencing output. Our normalization method not only quantifies adapter-ligated molecules but also allows normalization of feature space; for example, we can normalize to reads of interest such as non-ribosomal reads. As a result, this normalization method improves the efficiency of high-throughput next-generation sequencing by reducing noise and producing higher average reads per sample with more even sequencing depth. IMPORTANCE High-throughput next generation sequencing (NGS) has significantly contributed to the field of genomics; however, further improvements can maximize the potential of this important tool. Uneven sequencing of samples in a multiplexed run is a common issue that leads to unexpected extra costs or low-quality data. To mitigate this problem, we introduce a normalization method based on read counts rather than library concentration. This method allows for an even distribution of features of interest across samples, improving the statistical power of data sets and preventing the financial loss associated with resequencing libraries. This method optimizes NGS, which already has huge importance across many areas of biology.https://journals.asm.org/doi/10.1128/msystems.00006-23metagenomicslarge-scale studiesNGS normalizationautomationmultiplexingquantification
spellingShingle Caitriona Brennan
Rodolfo A. Salido
Pedro Belda-Ferre
MacKenzie Bryant
Charles Cowart
Maria D. Tiu
Antonio González
Daniel McDonald
Caitlin Tribelhorn
Amir Zarrinpar
Rob Knight
Maximizing the potential of high-throughput next-generation sequencing through precise normalization based on read count distribution
mSystems
metagenomics
large-scale studies
NGS normalization
automation
multiplexing
quantification
title Maximizing the potential of high-throughput next-generation sequencing through precise normalization based on read count distribution
title_full Maximizing the potential of high-throughput next-generation sequencing through precise normalization based on read count distribution
title_fullStr Maximizing the potential of high-throughput next-generation sequencing through precise normalization based on read count distribution
title_full_unstemmed Maximizing the potential of high-throughput next-generation sequencing through precise normalization based on read count distribution
title_short Maximizing the potential of high-throughput next-generation sequencing through precise normalization based on read count distribution
title_sort maximizing the potential of high throughput next generation sequencing through precise normalization based on read count distribution
topic metagenomics
large-scale studies
NGS normalization
automation
multiplexing
quantification
url https://journals.asm.org/doi/10.1128/msystems.00006-23
work_keys_str_mv AT caitrionabrennan maximizingthepotentialofhighthroughputnextgenerationsequencingthroughprecisenormalizationbasedonreadcountdistribution
AT rodolfoasalido maximizingthepotentialofhighthroughputnextgenerationsequencingthroughprecisenormalizationbasedonreadcountdistribution
AT pedrobeldaferre maximizingthepotentialofhighthroughputnextgenerationsequencingthroughprecisenormalizationbasedonreadcountdistribution
AT mackenziebryant maximizingthepotentialofhighthroughputnextgenerationsequencingthroughprecisenormalizationbasedonreadcountdistribution
AT charlescowart maximizingthepotentialofhighthroughputnextgenerationsequencingthroughprecisenormalizationbasedonreadcountdistribution
AT mariadtiu maximizingthepotentialofhighthroughputnextgenerationsequencingthroughprecisenormalizationbasedonreadcountdistribution
AT antoniogonzalez maximizingthepotentialofhighthroughputnextgenerationsequencingthroughprecisenormalizationbasedonreadcountdistribution
AT danielmcdonald maximizingthepotentialofhighthroughputnextgenerationsequencingthroughprecisenormalizationbasedonreadcountdistribution
AT caitlintribelhorn maximizingthepotentialofhighthroughputnextgenerationsequencingthroughprecisenormalizationbasedonreadcountdistribution
AT amirzarrinpar maximizingthepotentialofhighthroughputnextgenerationsequencingthroughprecisenormalizationbasedonreadcountdistribution
AT robknight maximizingthepotentialofhighthroughputnextgenerationsequencingthroughprecisenormalizationbasedonreadcountdistribution