The quantitative impact of read mapping to non-native reference genomes in comparative RNA-Seq studies.

Sequence read alignment to a reference genome is a fundamental step in many genomics studies. Accuracy in this fundamental step is crucial for correct interpretation of biological data. In cases where two or more closely related bacterial strains are being studied, a common approach is to simply map...

Full description

Bibliographic Details
Main Authors: Adam Price, Cynthia Gibas
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2017-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC5507458?pdf=render
_version_ 1819071897553338368
author Adam Price
Cynthia Gibas
author_facet Adam Price
Cynthia Gibas
author_sort Adam Price
collection DOAJ
description Sequence read alignment to a reference genome is a fundamental step in many genomics studies. Accuracy in this fundamental step is crucial for correct interpretation of biological data. In cases where two or more closely related bacterial strains are being studied, a common approach is to simply map reads from all strains to a common reference genome, whether because there is no closed reference for some strains or for ease of comparison. The assumption is that the differences between bacterial strains are insignificant enough that the results of differential expression analysis will not be influenced by choice of reference. Genes that are common among the strains under study are used for differential expression analysis, while the remaining genes, which may fail to express in one sample or the other because they are simply absent, are analyzed separately. In this study, we investigate the practice of using a common reference in transcriptomic analysis. We analyze two multi-strain transcriptomic data sets that were initially presented in the literature as comparisons based on a common reference, but which have available closed genomic sequence for all strains, allowing a detailed examination of the impact of reference choice. We provide a method for identifying regions that are most affected by non-native alignments, leading to false positives in differential expression analysis, and perform an in depth analysis identifying the extent of expression loss. We also simulate several data sets to identify best practices for non-native reference use.
first_indexed 2024-12-21T17:29:08Z
format Article
id doaj.art-84f4d9e9638c4ca1b7e4d4f7767bf0e3
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-12-21T17:29:08Z
publishDate 2017-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-84f4d9e9638c4ca1b7e4d4f7767bf0e32022-12-21T18:55:58ZengPublic Library of Science (PLoS)PLoS ONE1932-62032017-01-01127e018090410.1371/journal.pone.0180904The quantitative impact of read mapping to non-native reference genomes in comparative RNA-Seq studies.Adam PriceCynthia GibasSequence read alignment to a reference genome is a fundamental step in many genomics studies. Accuracy in this fundamental step is crucial for correct interpretation of biological data. In cases where two or more closely related bacterial strains are being studied, a common approach is to simply map reads from all strains to a common reference genome, whether because there is no closed reference for some strains or for ease of comparison. The assumption is that the differences between bacterial strains are insignificant enough that the results of differential expression analysis will not be influenced by choice of reference. Genes that are common among the strains under study are used for differential expression analysis, while the remaining genes, which may fail to express in one sample or the other because they are simply absent, are analyzed separately. In this study, we investigate the practice of using a common reference in transcriptomic analysis. We analyze two multi-strain transcriptomic data sets that were initially presented in the literature as comparisons based on a common reference, but which have available closed genomic sequence for all strains, allowing a detailed examination of the impact of reference choice. We provide a method for identifying regions that are most affected by non-native alignments, leading to false positives in differential expression analysis, and perform an in depth analysis identifying the extent of expression loss. We also simulate several data sets to identify best practices for non-native reference use.http://europepmc.org/articles/PMC5507458?pdf=render
spellingShingle Adam Price
Cynthia Gibas
The quantitative impact of read mapping to non-native reference genomes in comparative RNA-Seq studies.
PLoS ONE
title The quantitative impact of read mapping to non-native reference genomes in comparative RNA-Seq studies.
title_full The quantitative impact of read mapping to non-native reference genomes in comparative RNA-Seq studies.
title_fullStr The quantitative impact of read mapping to non-native reference genomes in comparative RNA-Seq studies.
title_full_unstemmed The quantitative impact of read mapping to non-native reference genomes in comparative RNA-Seq studies.
title_short The quantitative impact of read mapping to non-native reference genomes in comparative RNA-Seq studies.
title_sort quantitative impact of read mapping to non native reference genomes in comparative rna seq studies
url http://europepmc.org/articles/PMC5507458?pdf=render
work_keys_str_mv AT adamprice thequantitativeimpactofreadmappingtononnativereferencegenomesincomparativernaseqstudies
AT cynthiagibas thequantitativeimpactofreadmappingtononnativereferencegenomesincomparativernaseqstudies
AT adamprice quantitativeimpactofreadmappingtononnativereferencegenomesincomparativernaseqstudies
AT cynthiagibas quantitativeimpactofreadmappingtononnativereferencegenomesincomparativernaseqstudies