Discordant Genome Assemblies Drastically Alter the Interpretation of Single-Cell RNA Sequencing Data Which Can Be Mitigated by a Novel Integration Method

Advances in sequencing and assembly technology have led to the creation of genome assemblies for a wide variety of non-model organisms. The rapid production and proliferation of updated, novel assembly versions can create vexing problems for researchers when multiple-genome assembly versions are ava...

Full description

Bibliographic Details
Main Authors: Helen G. Potts, Madeleine E. Lemieux, Edward S. Rice, Wesley Warren, Robin P. Choudhury, Mathilda T. M. Mommersteeg
Format: Article
Language:English
Published: MDPI AG 2022-02-01
Series:Cells
Subjects:
Online Access:https://www.mdpi.com/2073-4409/11/4/608
_version_ 1797481877393637376
author Helen G. Potts
Madeleine E. Lemieux
Edward S. Rice
Wesley Warren
Robin P. Choudhury
Mathilda T. M. Mommersteeg
author_facet Helen G. Potts
Madeleine E. Lemieux
Edward S. Rice
Wesley Warren
Robin P. Choudhury
Mathilda T. M. Mommersteeg
author_sort Helen G. Potts
collection DOAJ
description Advances in sequencing and assembly technology have led to the creation of genome assemblies for a wide variety of non-model organisms. The rapid production and proliferation of updated, novel assembly versions can create vexing problems for researchers when multiple-genome assembly versions are available at once, requiring researchers to work with more than one reference genome. Multiple-genome assemblies are especially problematic for researchers studying the genetic makeup of individual cells, as single-cell RNA sequencing (scRNAseq) requires sequenced reads to be mapped and aligned to a single reference genome. Using the <i>Astyanax mexicanus</i>, this study highlights how the interpretation of a single-cell dataset from the same sample changes when aligned to its two different available genome assemblies. We found that the number of cells and expressed genes detected were drastically different when aligning to the different assemblies. When the genome assemblies were used in isolation with their respective annotations, cell-type identification was confounded, as some classic cell-type markers were assembly-specific, whilst other genes showed differential patterns of expression between the two assemblies. To overcome the problems posed by multiple-genome assemblies, we propose that researchers align to each available assembly and then integrate the resultant datasets to produce a final dataset in which all genome alignments can be used simultaneously. We found that this approach increased the accuracy of cell-type identification and maximised the amount of data that could be extracted from our single-cell sample by capturing all possible cells and transcripts. As scRNAseq becomes more widely available, it is imperative that the single-cell community is aware of how genome assembly alignment can alter single-cell data and their interpretation, especially when reviewing studies on non-model organisms.
first_indexed 2024-03-09T22:20:50Z
format Article
id doaj.art-158994f1a5ca40c5ac1716139f778947
institution Directory Open Access Journal
issn 2073-4409
language English
last_indexed 2024-03-09T22:20:50Z
publishDate 2022-02-01
publisher MDPI AG
record_format Article
series Cells
spelling doaj.art-158994f1a5ca40c5ac1716139f7789472023-11-23T19:14:02ZengMDPI AGCells2073-44092022-02-0111460810.3390/cells11040608Discordant Genome Assemblies Drastically Alter the Interpretation of Single-Cell RNA Sequencing Data Which Can Be Mitigated by a Novel Integration MethodHelen G. Potts0Madeleine E. Lemieux1Edward S. Rice2Wesley Warren3Robin P. Choudhury4Mathilda T. M. Mommersteeg5Burdon Sanderson Cardiac Science Centre, Department of Physiology, Anatomy & Genetics, University of Oxford, Oxford OX1 3PT, UKBioinfo, Plantagenet, ON K0B 1L0, CanadaDepartment of Animal Sciences, Bond Life Sciences Center, University of Missouri, Columbia, MO 65201, USADepartment of Animal Sciences, Bond Life Sciences Center, University of Missouri, Columbia, MO 65201, USADivision of Cardiovascular Medicine, University of Oxford, Oxford OX3 9DU, UKBurdon Sanderson Cardiac Science Centre, Department of Physiology, Anatomy & Genetics, University of Oxford, Oxford OX1 3PT, UKAdvances in sequencing and assembly technology have led to the creation of genome assemblies for a wide variety of non-model organisms. The rapid production and proliferation of updated, novel assembly versions can create vexing problems for researchers when multiple-genome assembly versions are available at once, requiring researchers to work with more than one reference genome. Multiple-genome assemblies are especially problematic for researchers studying the genetic makeup of individual cells, as single-cell RNA sequencing (scRNAseq) requires sequenced reads to be mapped and aligned to a single reference genome. Using the <i>Astyanax mexicanus</i>, this study highlights how the interpretation of a single-cell dataset from the same sample changes when aligned to its two different available genome assemblies. We found that the number of cells and expressed genes detected were drastically different when aligning to the different assemblies. When the genome assemblies were used in isolation with their respective annotations, cell-type identification was confounded, as some classic cell-type markers were assembly-specific, whilst other genes showed differential patterns of expression between the two assemblies. To overcome the problems posed by multiple-genome assemblies, we propose that researchers align to each available assembly and then integrate the resultant datasets to produce a final dataset in which all genome alignments can be used simultaneously. We found that this approach increased the accuracy of cell-type identification and maximised the amount of data that could be extracted from our single-cell sample by capturing all possible cells and transcripts. As scRNAseq becomes more widely available, it is imperative that the single-cell community is aware of how genome assembly alignment can alter single-cell data and their interpretation, especially when reviewing studies on non-model organisms.https://www.mdpi.com/2073-4409/11/4/608genome assembly<i>Astyanax mexicanus</i>integrationseuratread alignmentnon-model organisms
spellingShingle Helen G. Potts
Madeleine E. Lemieux
Edward S. Rice
Wesley Warren
Robin P. Choudhury
Mathilda T. M. Mommersteeg
Discordant Genome Assemblies Drastically Alter the Interpretation of Single-Cell RNA Sequencing Data Which Can Be Mitigated by a Novel Integration Method
Cells
genome assembly
<i>Astyanax mexicanus</i>
integration
seurat
read alignment
non-model organisms
title Discordant Genome Assemblies Drastically Alter the Interpretation of Single-Cell RNA Sequencing Data Which Can Be Mitigated by a Novel Integration Method
title_full Discordant Genome Assemblies Drastically Alter the Interpretation of Single-Cell RNA Sequencing Data Which Can Be Mitigated by a Novel Integration Method
title_fullStr Discordant Genome Assemblies Drastically Alter the Interpretation of Single-Cell RNA Sequencing Data Which Can Be Mitigated by a Novel Integration Method
title_full_unstemmed Discordant Genome Assemblies Drastically Alter the Interpretation of Single-Cell RNA Sequencing Data Which Can Be Mitigated by a Novel Integration Method
title_short Discordant Genome Assemblies Drastically Alter the Interpretation of Single-Cell RNA Sequencing Data Which Can Be Mitigated by a Novel Integration Method
title_sort discordant genome assemblies drastically alter the interpretation of single cell rna sequencing data which can be mitigated by a novel integration method
topic genome assembly
<i>Astyanax mexicanus</i>
integration
seurat
read alignment
non-model organisms
url https://www.mdpi.com/2073-4409/11/4/608
work_keys_str_mv AT helengpotts discordantgenomeassembliesdrasticallyaltertheinterpretationofsinglecellrnasequencingdatawhichcanbemitigatedbyanovelintegrationmethod
AT madeleineelemieux discordantgenomeassembliesdrasticallyaltertheinterpretationofsinglecellrnasequencingdatawhichcanbemitigatedbyanovelintegrationmethod
AT edwardsrice discordantgenomeassembliesdrasticallyaltertheinterpretationofsinglecellrnasequencingdatawhichcanbemitigatedbyanovelintegrationmethod
AT wesleywarren discordantgenomeassembliesdrasticallyaltertheinterpretationofsinglecellrnasequencingdatawhichcanbemitigatedbyanovelintegrationmethod
AT robinpchoudhury discordantgenomeassembliesdrasticallyaltertheinterpretationofsinglecellrnasequencingdatawhichcanbemitigatedbyanovelintegrationmethod
AT mathildatmmommersteeg discordantgenomeassembliesdrasticallyaltertheinterpretationofsinglecellrnasequencingdatawhichcanbemitigatedbyanovelintegrationmethod