DIscBIO: A User-Friendly Pipeline for Biomarker Discovery in Single-Cell Transcriptomics

The growing attention toward the benefits of single-cell RNA sequencing (scRNA-seq) is leading to a myriad of computational packages for the analysis of different aspects of scRNA-seq data. For researchers without advanced programing skills, it is very challenging to combine several packages in orde...

Full description

Bibliographic Details
Main Authors: Salim Ghannoum, Waldir Leoncio Netto, Damiano Fantini, Benjamin Ragan-Kelley, Amirabbas Parizadeh, Emma Jonasson, Anders Ståhlberg, Hesso Farhan, Alvaro Köhn-Luque
Format: Article
Language:English
Published: MDPI AG 2021-01-01
Series:International Journal of Molecular Sciences
Subjects:
Online Access:https://www.mdpi.com/1422-0067/22/3/1399
_version_ 1797405772789841920
author Salim Ghannoum
Waldir Leoncio Netto
Damiano Fantini
Benjamin Ragan-Kelley
Amirabbas Parizadeh
Emma Jonasson
Anders Ståhlberg
Hesso Farhan
Alvaro Köhn-Luque
author_facet Salim Ghannoum
Waldir Leoncio Netto
Damiano Fantini
Benjamin Ragan-Kelley
Amirabbas Parizadeh
Emma Jonasson
Anders Ståhlberg
Hesso Farhan
Alvaro Köhn-Luque
author_sort Salim Ghannoum
collection DOAJ
description The growing attention toward the benefits of single-cell RNA sequencing (scRNA-seq) is leading to a myriad of computational packages for the analysis of different aspects of scRNA-seq data. For researchers without advanced programing skills, it is very challenging to combine several packages in order to perform the desired analysis in a simple and reproducible way. Here we present DIscBIO, an open-source, multi-algorithmic pipeline for easy, efficient and reproducible analysis of cellular sub-populations at the transcriptomic level. The pipeline integrates multiple scRNA-seq packages and allows biomarker discovery with decision trees and gene enrichment analysis in a network context using single-cell sequencing read counts through clustering and differential analysis. DIscBIO is freely available as an R package. It can be run either in command-line mode or through a user-friendly computational pipeline using Jupyter notebooks. We showcase all pipeline features using two scRNA-seq datasets. The first dataset consists of circulating tumor cells from patients with breast cancer. The second one is a cell cycle regulation dataset in myxoid liposarcoma. All analyses are available as notebooks that integrate in a sequential narrative R code with explanatory text and output data and images. R users can use the notebooks to understand the different steps of the pipeline and will guide them to explore their scRNA-seq data. We also provide a cloud version using Binder that allows the execution of the pipeline without the need of downloading R, Jupyter or any of the packages used by the pipeline. The cloud version can serve as a tutorial for training purposes, especially for those that are not R users or have limited programing skills. However, in order to do meaningful scRNA-seq analyses, all users will need to understand the implemented methods and their possible options and limitations.
first_indexed 2024-03-09T03:15:00Z
format Article
id doaj.art-72ebfc4cec084d099dd309880374d8d8
institution Directory Open Access Journal
issn 1661-6596
1422-0067
language English
last_indexed 2024-03-09T03:15:00Z
publishDate 2021-01-01
publisher MDPI AG
record_format Article
series International Journal of Molecular Sciences
spelling doaj.art-72ebfc4cec084d099dd309880374d8d82023-12-03T15:22:49ZengMDPI AGInternational Journal of Molecular Sciences1661-65961422-00672021-01-01223139910.3390/ijms22031399DIscBIO: A User-Friendly Pipeline for Biomarker Discovery in Single-Cell TranscriptomicsSalim Ghannoum0Waldir Leoncio Netto1Damiano Fantini2Benjamin Ragan-Kelley3Amirabbas Parizadeh4Emma Jonasson5Anders Ståhlberg6Hesso Farhan7Alvaro Köhn-Luque8Department of Molecular Medicine, Institute of Basic Medical Sciences, University of Oslo, 0372 Oslo, NorwayOslo Centre for Biostatistics and Epidemiology, Faculty of Medicine, University of Oslo, 0372 Oslo, NorwayDepartment of Urology, Northwestern University, Chicago, IL 60611, USASimula Research Laboratory, 1325 Lysaker, NorwayDepartment of Molecular Medicine, Institute of Basic Medical Sciences, University of Oslo, 0372 Oslo, NorwaySahlgrenska Center for Cancer Research, Department of Laboratory Medicine, Institute of Biomedicine, Sahlgrenska Academy at University of Gothenburg, SE-41390 Gothenburg, SwedenSahlgrenska Center for Cancer Research, Department of Laboratory Medicine, Institute of Biomedicine, Sahlgrenska Academy at University of Gothenburg, SE-41390 Gothenburg, SwedenDepartment of Molecular Medicine, Institute of Basic Medical Sciences, University of Oslo, 0372 Oslo, NorwayOslo Centre for Biostatistics and Epidemiology, Faculty of Medicine, University of Oslo, 0372 Oslo, NorwayThe growing attention toward the benefits of single-cell RNA sequencing (scRNA-seq) is leading to a myriad of computational packages for the analysis of different aspects of scRNA-seq data. For researchers without advanced programing skills, it is very challenging to combine several packages in order to perform the desired analysis in a simple and reproducible way. Here we present DIscBIO, an open-source, multi-algorithmic pipeline for easy, efficient and reproducible analysis of cellular sub-populations at the transcriptomic level. The pipeline integrates multiple scRNA-seq packages and allows biomarker discovery with decision trees and gene enrichment analysis in a network context using single-cell sequencing read counts through clustering and differential analysis. DIscBIO is freely available as an R package. It can be run either in command-line mode or through a user-friendly computational pipeline using Jupyter notebooks. We showcase all pipeline features using two scRNA-seq datasets. The first dataset consists of circulating tumor cells from patients with breast cancer. The second one is a cell cycle regulation dataset in myxoid liposarcoma. All analyses are available as notebooks that integrate in a sequential narrative R code with explanatory text and output data and images. R users can use the notebooks to understand the different steps of the pipeline and will guide them to explore their scRNA-seq data. We also provide a cloud version using Binder that allows the execution of the pipeline without the need of downloading R, Jupyter or any of the packages used by the pipeline. The cloud version can serve as a tutorial for training purposes, especially for those that are not R users or have limited programing skills. However, in order to do meaningful scRNA-seq analyses, all users will need to understand the implemented methods and their possible options and limitations.https://www.mdpi.com/1422-0067/22/3/1399single-cell sequencingnormalizationgene filteringERCC spike-insbiomarkersDEGs
spellingShingle Salim Ghannoum
Waldir Leoncio Netto
Damiano Fantini
Benjamin Ragan-Kelley
Amirabbas Parizadeh
Emma Jonasson
Anders Ståhlberg
Hesso Farhan
Alvaro Köhn-Luque
DIscBIO: A User-Friendly Pipeline for Biomarker Discovery in Single-Cell Transcriptomics
International Journal of Molecular Sciences
single-cell sequencing
normalization
gene filtering
ERCC spike-ins
biomarkers
DEGs
title DIscBIO: A User-Friendly Pipeline for Biomarker Discovery in Single-Cell Transcriptomics
title_full DIscBIO: A User-Friendly Pipeline for Biomarker Discovery in Single-Cell Transcriptomics
title_fullStr DIscBIO: A User-Friendly Pipeline for Biomarker Discovery in Single-Cell Transcriptomics
title_full_unstemmed DIscBIO: A User-Friendly Pipeline for Biomarker Discovery in Single-Cell Transcriptomics
title_short DIscBIO: A User-Friendly Pipeline for Biomarker Discovery in Single-Cell Transcriptomics
title_sort discbio a user friendly pipeline for biomarker discovery in single cell transcriptomics
topic single-cell sequencing
normalization
gene filtering
ERCC spike-ins
biomarkers
DEGs
url https://www.mdpi.com/1422-0067/22/3/1399
work_keys_str_mv AT salimghannoum discbioauserfriendlypipelineforbiomarkerdiscoveryinsinglecelltranscriptomics
AT waldirleoncionetto discbioauserfriendlypipelineforbiomarkerdiscoveryinsinglecelltranscriptomics
AT damianofantini discbioauserfriendlypipelineforbiomarkerdiscoveryinsinglecelltranscriptomics
AT benjaminragankelley discbioauserfriendlypipelineforbiomarkerdiscoveryinsinglecelltranscriptomics
AT amirabbasparizadeh discbioauserfriendlypipelineforbiomarkerdiscoveryinsinglecelltranscriptomics
AT emmajonasson discbioauserfriendlypipelineforbiomarkerdiscoveryinsinglecelltranscriptomics
AT andersstahlberg discbioauserfriendlypipelineforbiomarkerdiscoveryinsinglecelltranscriptomics
AT hessofarhan discbioauserfriendlypipelineforbiomarkerdiscoveryinsinglecelltranscriptomics
AT alvarokohnluque discbioauserfriendlypipelineforbiomarkerdiscoveryinsinglecelltranscriptomics