DNA read count calibration for single-molecule, long-read sequencing

Abstract There are many applications in which quantitative information about DNA mixtures with different molecular lengths is important. Gene therapy vectors are much longer than can be sequenced individually via short-read NGS. However, vector preparations may contain smaller DNAs that behave diffe...

Full description

Bibliographic Details
Main Authors:	Luis M. M. Soares, Terrence Hanscom, Donald E. Selby, Samuel Adjei, Wei Wang, Dariusz Przybylski, John F. Thompson
Format:	Article
Language:	English
Published:	Nature Portfolio 2022-11-01
Series:	Scientific Reports
Online Access:	https://doi.org/10.1038/s41598-022-21606-5

_version_	1811223510715990016
author	Luis M. M. Soares Terrence Hanscom Donald E. Selby Samuel Adjei Wei Wang Dariusz Przybylski John F. Thompson
author_facet	Luis M. M. Soares Terrence Hanscom Donald E. Selby Samuel Adjei Wei Wang Dariusz Przybylski John F. Thompson
author_sort	Luis M. M. Soares
collection	DOAJ
description	Abstract There are many applications in which quantitative information about DNA mixtures with different molecular lengths is important. Gene therapy vectors are much longer than can be sequenced individually via short-read NGS. However, vector preparations may contain smaller DNAs that behave differently during sequencing. We have used two library preparations each for Pacific Biosystems (PacBio) and Oxford Nanopore Technologies NGS to determine their suitability for quantitative assessment of varying sized DNAs. Equimolar length standards were generated from E. coli genomic DNA. Both PacBio library preparations provided a consistent length dependence though with a complex pattern. This method is sufficiently sensitive that differences in genomic copy number between DNA from E. coli grown in exponential and stationary phase conditions could be detected. The transposase-based Oxford Nanopore library preparation provided a predictable length dependence, but the random sequence starts caused the loss of original length information. The ligation-based approach retained length information but read frequency was more variable. Modeling of E. coli versus lambda read frequency via cubic spline smoothing showed that the shorter genome could be used as a suitable internal spike-in for DNAs in the 200 bp to 10 kb range, allowing meaningful QC to be carried out with AAV preparations.
first_indexed	2024-04-12T08:33:53Z
format	Article
id	doaj.art-570c2962ec0448799d54758ad8db824e
institution	Directory Open Access Journal
issn	2045-2322
language	English
last_indexed	2024-04-12T08:33:53Z
publishDate	2022-11-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj.art-570c2962ec0448799d54758ad8db824e2022-12-22T03:40:04ZengNature PortfolioScientific Reports2045-23222022-11-0112111510.1038/s41598-022-21606-5DNA read count calibration for single-molecule, long-read sequencingLuis M. M. Soares0Terrence Hanscom1Donald E. Selby2Samuel Adjei3Wei Wang4Dariusz Przybylski5John F. Thompson6Genomics and Computational Biology, Homology Medicines IncGenomics and Computational Biology, Homology Medicines IncGenomics and Computational Biology, Homology Medicines IncGenomics and Computational Biology, Homology Medicines IncGenomics and Computational Biology, Homology Medicines IncGenomics and Computational Biology, Homology Medicines IncGenomics and Computational Biology, Homology Medicines IncAbstract There are many applications in which quantitative information about DNA mixtures with different molecular lengths is important. Gene therapy vectors are much longer than can be sequenced individually via short-read NGS. However, vector preparations may contain smaller DNAs that behave differently during sequencing. We have used two library preparations each for Pacific Biosystems (PacBio) and Oxford Nanopore Technologies NGS to determine their suitability for quantitative assessment of varying sized DNAs. Equimolar length standards were generated from E. coli genomic DNA. Both PacBio library preparations provided a consistent length dependence though with a complex pattern. This method is sufficiently sensitive that differences in genomic copy number between DNA from E. coli grown in exponential and stationary phase conditions could be detected. The transposase-based Oxford Nanopore library preparation provided a predictable length dependence, but the random sequence starts caused the loss of original length information. The ligation-based approach retained length information but read frequency was more variable. Modeling of E. coli versus lambda read frequency via cubic spline smoothing showed that the shorter genome could be used as a suitable internal spike-in for DNAs in the 200 bp to 10 kb range, allowing meaningful QC to be carried out with AAV preparations.https://doi.org/10.1038/s41598-022-21606-5
spellingShingle	Luis M. M. Soares Terrence Hanscom Donald E. Selby Samuel Adjei Wei Wang Dariusz Przybylski John F. Thompson DNA read count calibration for single-molecule, long-read sequencing Scientific Reports
title	DNA read count calibration for single-molecule, long-read sequencing
title_full	DNA read count calibration for single-molecule, long-read sequencing
title_fullStr	DNA read count calibration for single-molecule, long-read sequencing
title_full_unstemmed	DNA read count calibration for single-molecule, long-read sequencing
title_short	DNA read count calibration for single-molecule, long-read sequencing
title_sort	dna read count calibration for single molecule long read sequencing
url	https://doi.org/10.1038/s41598-022-21606-5
work_keys_str_mv	AT luismmsoares dnareadcountcalibrationforsinglemoleculelongreadsequencing AT terrencehanscom dnareadcountcalibrationforsinglemoleculelongreadsequencing AT donaldeselby dnareadcountcalibrationforsinglemoleculelongreadsequencing AT samueladjei dnareadcountcalibrationforsinglemoleculelongreadsequencing AT weiwang dnareadcountcalibrationforsinglemoleculelongreadsequencing AT dariuszprzybylski dnareadcountcalibrationforsinglemoleculelongreadsequencing AT johnfthompson dnareadcountcalibrationforsinglemoleculelongreadsequencing

DNA read count calibration for single-molecule, long-read sequencing

Similar Items