Phylogenetic Heatmaps Highlight Composition Biases in Sequenced Reads

Due to advancements in sequencing technology, sequence data production is no longer a constraint in the field of microbiology and has made it possible to study uncultured microbes or whole environments using metagenomics. However, these new technologies introduce different biases in metagenomic sequ...

Full description

Bibliographic Details
Main Authors: Sulbha Choudhari, Andrey Grigoriev
Format: Article
Language:English
Published: MDPI AG 2017-01-01
Series:Microorganisms
Subjects:
Online Access:http://www.mdpi.com/2076-2607/5/1/4
_version_ 1818272445189062656
author Sulbha Choudhari
Andrey Grigoriev
author_facet Sulbha Choudhari
Andrey Grigoriev
author_sort Sulbha Choudhari
collection DOAJ
description Due to advancements in sequencing technology, sequence data production is no longer a constraint in the field of microbiology and has made it possible to study uncultured microbes or whole environments using metagenomics. However, these new technologies introduce different biases in metagenomic sequencing, affecting the nucleotide distribution of resulting sequence reads. Here, we illustrate such biases using two methods. One is based on phylogenetic heatmaps (PGHMs), a novel approach for compact visualization of sequence composition differences between two groups of sequences containing the same phylogenetic groups. This method is well suited for finding noise and biases when comparing metagenomics samples. We apply PGHMs to detect noise and bias in the data produced with different DNA extraction protocols, different sequencing platforms and different experimental frameworks. In parallel, we use principal component analysis displaying different clustering of sequences from each sample to support our findings and illustrate the utility of PGHMs. We considered contributions of the read length and GC-content variation and observed that in most cases biases were generally due to the GC-content of the reads.
first_indexed 2024-12-12T21:42:11Z
format Article
id doaj.art-72b8d2de60364fa7aeec34c253ce9f58
institution Directory Open Access Journal
issn 2076-2607
language English
last_indexed 2024-12-12T21:42:11Z
publishDate 2017-01-01
publisher MDPI AG
record_format Article
series Microorganisms
spelling doaj.art-72b8d2de60364fa7aeec34c253ce9f582022-12-22T00:11:01ZengMDPI AGMicroorganisms2076-26072017-01-0151410.3390/microorganisms5010004microorganisms5010004Phylogenetic Heatmaps Highlight Composition Biases in Sequenced ReadsSulbha Choudhari0Andrey Grigoriev1Department of Biology, Center for Computational and Integrative Biology, Rutgers University, Camden, NJ 08102, USADepartment of Biology, Center for Computational and Integrative Biology, Rutgers University, Camden, NJ 08102, USADue to advancements in sequencing technology, sequence data production is no longer a constraint in the field of microbiology and has made it possible to study uncultured microbes or whole environments using metagenomics. However, these new technologies introduce different biases in metagenomic sequencing, affecting the nucleotide distribution of resulting sequence reads. Here, we illustrate such biases using two methods. One is based on phylogenetic heatmaps (PGHMs), a novel approach for compact visualization of sequence composition differences between two groups of sequences containing the same phylogenetic groups. This method is well suited for finding noise and biases when comparing metagenomics samples. We apply PGHMs to detect noise and bias in the data produced with different DNA extraction protocols, different sequencing platforms and different experimental frameworks. In parallel, we use principal component analysis displaying different clustering of sequences from each sample to support our findings and illustrate the utility of PGHMs. We considered contributions of the read length and GC-content variation and observed that in most cases biases were generally due to the GC-content of the reads.http://www.mdpi.com/2076-2607/5/1/4nucleotide compositionmetagenomicssequencing biascomputational analysisgenome sequencing
spellingShingle Sulbha Choudhari
Andrey Grigoriev
Phylogenetic Heatmaps Highlight Composition Biases in Sequenced Reads
Microorganisms
nucleotide composition
metagenomics
sequencing bias
computational analysis
genome sequencing
title Phylogenetic Heatmaps Highlight Composition Biases in Sequenced Reads
title_full Phylogenetic Heatmaps Highlight Composition Biases in Sequenced Reads
title_fullStr Phylogenetic Heatmaps Highlight Composition Biases in Sequenced Reads
title_full_unstemmed Phylogenetic Heatmaps Highlight Composition Biases in Sequenced Reads
title_short Phylogenetic Heatmaps Highlight Composition Biases in Sequenced Reads
title_sort phylogenetic heatmaps highlight composition biases in sequenced reads
topic nucleotide composition
metagenomics
sequencing bias
computational analysis
genome sequencing
url http://www.mdpi.com/2076-2607/5/1/4
work_keys_str_mv AT sulbhachoudhari phylogeneticheatmapshighlightcompositionbiasesinsequencedreads
AT andreygrigoriev phylogeneticheatmapshighlightcompositionbiasesinsequencedreads