DNA read count calibration for single-molecule, long-read sequencing
Abstract There are many applications in which quantitative information about DNA mixtures with different molecular lengths is important. Gene therapy vectors are much longer than can be sequenced individually via short-read NGS. However, vector preparations may contain smaller DNAs that behave diffe...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2022-11-01
|
Series: | Scientific Reports |
Online Access: | https://doi.org/10.1038/s41598-022-21606-5 |
_version_ | 1811223510715990016 |
---|---|
author | Luis M. M. Soares Terrence Hanscom Donald E. Selby Samuel Adjei Wei Wang Dariusz Przybylski John F. Thompson |
author_facet | Luis M. M. Soares Terrence Hanscom Donald E. Selby Samuel Adjei Wei Wang Dariusz Przybylski John F. Thompson |
author_sort | Luis M. M. Soares |
collection | DOAJ |
description | Abstract There are many applications in which quantitative information about DNA mixtures with different molecular lengths is important. Gene therapy vectors are much longer than can be sequenced individually via short-read NGS. However, vector preparations may contain smaller DNAs that behave differently during sequencing. We have used two library preparations each for Pacific Biosystems (PacBio) and Oxford Nanopore Technologies NGS to determine their suitability for quantitative assessment of varying sized DNAs. Equimolar length standards were generated from E. coli genomic DNA. Both PacBio library preparations provided a consistent length dependence though with a complex pattern. This method is sufficiently sensitive that differences in genomic copy number between DNA from E. coli grown in exponential and stationary phase conditions could be detected. The transposase-based Oxford Nanopore library preparation provided a predictable length dependence, but the random sequence starts caused the loss of original length information. The ligation-based approach retained length information but read frequency was more variable. Modeling of E. coli versus lambda read frequency via cubic spline smoothing showed that the shorter genome could be used as a suitable internal spike-in for DNAs in the 200 bp to 10 kb range, allowing meaningful QC to be carried out with AAV preparations. |
first_indexed | 2024-04-12T08:33:53Z |
format | Article |
id | doaj.art-570c2962ec0448799d54758ad8db824e |
institution | Directory Open Access Journal |
issn | 2045-2322 |
language | English |
last_indexed | 2024-04-12T08:33:53Z |
publishDate | 2022-11-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Reports |
spelling | doaj.art-570c2962ec0448799d54758ad8db824e2022-12-22T03:40:04ZengNature PortfolioScientific Reports2045-23222022-11-0112111510.1038/s41598-022-21606-5DNA read count calibration for single-molecule, long-read sequencingLuis M. M. Soares0Terrence Hanscom1Donald E. Selby2Samuel Adjei3Wei Wang4Dariusz Przybylski5John F. Thompson6Genomics and Computational Biology, Homology Medicines IncGenomics and Computational Biology, Homology Medicines IncGenomics and Computational Biology, Homology Medicines IncGenomics and Computational Biology, Homology Medicines IncGenomics and Computational Biology, Homology Medicines IncGenomics and Computational Biology, Homology Medicines IncGenomics and Computational Biology, Homology Medicines IncAbstract There are many applications in which quantitative information about DNA mixtures with different molecular lengths is important. Gene therapy vectors are much longer than can be sequenced individually via short-read NGS. However, vector preparations may contain smaller DNAs that behave differently during sequencing. We have used two library preparations each for Pacific Biosystems (PacBio) and Oxford Nanopore Technologies NGS to determine their suitability for quantitative assessment of varying sized DNAs. Equimolar length standards were generated from E. coli genomic DNA. Both PacBio library preparations provided a consistent length dependence though with a complex pattern. This method is sufficiently sensitive that differences in genomic copy number between DNA from E. coli grown in exponential and stationary phase conditions could be detected. The transposase-based Oxford Nanopore library preparation provided a predictable length dependence, but the random sequence starts caused the loss of original length information. The ligation-based approach retained length information but read frequency was more variable. Modeling of E. coli versus lambda read frequency via cubic spline smoothing showed that the shorter genome could be used as a suitable internal spike-in for DNAs in the 200 bp to 10 kb range, allowing meaningful QC to be carried out with AAV preparations.https://doi.org/10.1038/s41598-022-21606-5 |
spellingShingle | Luis M. M. Soares Terrence Hanscom Donald E. Selby Samuel Adjei Wei Wang Dariusz Przybylski John F. Thompson DNA read count calibration for single-molecule, long-read sequencing Scientific Reports |
title | DNA read count calibration for single-molecule, long-read sequencing |
title_full | DNA read count calibration for single-molecule, long-read sequencing |
title_fullStr | DNA read count calibration for single-molecule, long-read sequencing |
title_full_unstemmed | DNA read count calibration for single-molecule, long-read sequencing |
title_short | DNA read count calibration for single-molecule, long-read sequencing |
title_sort | dna read count calibration for single molecule long read sequencing |
url | https://doi.org/10.1038/s41598-022-21606-5 |
work_keys_str_mv | AT luismmsoares dnareadcountcalibrationforsinglemoleculelongreadsequencing AT terrencehanscom dnareadcountcalibrationforsinglemoleculelongreadsequencing AT donaldeselby dnareadcountcalibrationforsinglemoleculelongreadsequencing AT samueladjei dnareadcountcalibrationforsinglemoleculelongreadsequencing AT weiwang dnareadcountcalibrationforsinglemoleculelongreadsequencing AT dariuszprzybylski dnareadcountcalibrationforsinglemoleculelongreadsequencing AT johnfthompson dnareadcountcalibrationforsinglemoleculelongreadsequencing |