Sequence deeper without sequencing more: Bayesian resolution of ambiguously mapped reads.

Next-generation sequencing (NGS) has transformed molecular biology and contributed to many seminal insights into genomic regulation and function. Apart from whole-genome sequencing, an NGS workflow involves alignment of the sequencing reads to the genome of study, after which the resulting alignment...

Full description

Bibliographic Details
Main Authors: Rohan N Shah, Alexander J Ruthenburg
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2021-04-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1008926
_version_ 1819260293126029312
author Rohan N Shah
Alexander J Ruthenburg
author_facet Rohan N Shah
Alexander J Ruthenburg
author_sort Rohan N Shah
collection DOAJ
description Next-generation sequencing (NGS) has transformed molecular biology and contributed to many seminal insights into genomic regulation and function. Apart from whole-genome sequencing, an NGS workflow involves alignment of the sequencing reads to the genome of study, after which the resulting alignments can be used for downstream analyses. However, alignment is complicated by the repetitive sequences; many reads align to more than one genomic locus, with 15-30% of the genome not being uniquely mappable by short-read NGS. This problem is typically addressed by discarding reads that do not uniquely map to the genome, but this practice can lead to systematic distortion of the data. Previous studies that developed methods for handling ambiguously mapped reads were often of limited applicability or were computationally intensive, hindering their broader usage. In this work, we present SmartMap: an algorithm that augments industry-standard aligners to enable usage of ambiguously mapped reads by assigning weights to each alignment with Bayesian analysis of the read distribution and alignment quality. SmartMap is computationally efficient, utilizing far fewer weighting iterations than previously thought necessary to process alignments and, as such, analyzing more than a billion alignments of NGS reads in approximately one hour on a desktop PC. By applying SmartMap to peak-type NGS data, including MNase-seq, ChIP-seq, and ATAC-seq in three organisms, we can increase read depth by up to 53% and increase the mapped proportion of the genome by up to 18% compared to analyses utilizing only uniquely mapped reads. We further show that SmartMap enables the analysis of more than 140,000 repetitive elements that could not be analyzed by traditional ChIP-seq workflows, and we utilize this method to gain insight into the epigenetic regulation of different classes of repetitive elements. These data emphasize both the dangers of discarding ambiguously mapped reads and their power for driving biological discovery.
first_indexed 2024-12-23T19:23:36Z
format Article
id doaj.art-18b352bcedc94f33b62bcc9cc323f2b2
institution Directory Open Access Journal
issn 1553-734X
1553-7358
language English
last_indexed 2024-12-23T19:23:36Z
publishDate 2021-04-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Computational Biology
spelling doaj.art-18b352bcedc94f33b62bcc9cc323f2b22022-12-21T17:34:06ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582021-04-01174e100892610.1371/journal.pcbi.1008926Sequence deeper without sequencing more: Bayesian resolution of ambiguously mapped reads.Rohan N ShahAlexander J RuthenburgNext-generation sequencing (NGS) has transformed molecular biology and contributed to many seminal insights into genomic regulation and function. Apart from whole-genome sequencing, an NGS workflow involves alignment of the sequencing reads to the genome of study, after which the resulting alignments can be used for downstream analyses. However, alignment is complicated by the repetitive sequences; many reads align to more than one genomic locus, with 15-30% of the genome not being uniquely mappable by short-read NGS. This problem is typically addressed by discarding reads that do not uniquely map to the genome, but this practice can lead to systematic distortion of the data. Previous studies that developed methods for handling ambiguously mapped reads were often of limited applicability or were computationally intensive, hindering their broader usage. In this work, we present SmartMap: an algorithm that augments industry-standard aligners to enable usage of ambiguously mapped reads by assigning weights to each alignment with Bayesian analysis of the read distribution and alignment quality. SmartMap is computationally efficient, utilizing far fewer weighting iterations than previously thought necessary to process alignments and, as such, analyzing more than a billion alignments of NGS reads in approximately one hour on a desktop PC. By applying SmartMap to peak-type NGS data, including MNase-seq, ChIP-seq, and ATAC-seq in three organisms, we can increase read depth by up to 53% and increase the mapped proportion of the genome by up to 18% compared to analyses utilizing only uniquely mapped reads. We further show that SmartMap enables the analysis of more than 140,000 repetitive elements that could not be analyzed by traditional ChIP-seq workflows, and we utilize this method to gain insight into the epigenetic regulation of different classes of repetitive elements. These data emphasize both the dangers of discarding ambiguously mapped reads and their power for driving biological discovery.https://doi.org/10.1371/journal.pcbi.1008926
spellingShingle Rohan N Shah
Alexander J Ruthenburg
Sequence deeper without sequencing more: Bayesian resolution of ambiguously mapped reads.
PLoS Computational Biology
title Sequence deeper without sequencing more: Bayesian resolution of ambiguously mapped reads.
title_full Sequence deeper without sequencing more: Bayesian resolution of ambiguously mapped reads.
title_fullStr Sequence deeper without sequencing more: Bayesian resolution of ambiguously mapped reads.
title_full_unstemmed Sequence deeper without sequencing more: Bayesian resolution of ambiguously mapped reads.
title_short Sequence deeper without sequencing more: Bayesian resolution of ambiguously mapped reads.
title_sort sequence deeper without sequencing more bayesian resolution of ambiguously mapped reads
url https://doi.org/10.1371/journal.pcbi.1008926
work_keys_str_mv AT rohannshah sequencedeeperwithoutsequencingmorebayesianresolutionofambiguouslymappedreads
AT alexanderjruthenburg sequencedeeperwithoutsequencingmorebayesianresolutionofambiguouslymappedreads