DIscBIO: A User-Friendly Pipeline for Biomarker Discovery in Single-Cell Transcriptomics
The growing attention toward the benefits of single-cell RNA sequencing (scRNA-seq) is leading to a myriad of computational packages for the analysis of different aspects of scRNA-seq data. For researchers without advanced programing skills, it is very challenging to combine several packages in orde...
Main Authors: | , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-01-01
|
Series: | International Journal of Molecular Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/1422-0067/22/3/1399 |
_version_ | 1797405772789841920 |
---|---|
author | Salim Ghannoum Waldir Leoncio Netto Damiano Fantini Benjamin Ragan-Kelley Amirabbas Parizadeh Emma Jonasson Anders Ståhlberg Hesso Farhan Alvaro Köhn-Luque |
author_facet | Salim Ghannoum Waldir Leoncio Netto Damiano Fantini Benjamin Ragan-Kelley Amirabbas Parizadeh Emma Jonasson Anders Ståhlberg Hesso Farhan Alvaro Köhn-Luque |
author_sort | Salim Ghannoum |
collection | DOAJ |
description | The growing attention toward the benefits of single-cell RNA sequencing (scRNA-seq) is leading to a myriad of computational packages for the analysis of different aspects of scRNA-seq data. For researchers without advanced programing skills, it is very challenging to combine several packages in order to perform the desired analysis in a simple and reproducible way. Here we present DIscBIO, an open-source, multi-algorithmic pipeline for easy, efficient and reproducible analysis of cellular sub-populations at the transcriptomic level. The pipeline integrates multiple scRNA-seq packages and allows biomarker discovery with decision trees and gene enrichment analysis in a network context using single-cell sequencing read counts through clustering and differential analysis. DIscBIO is freely available as an R package. It can be run either in command-line mode or through a user-friendly computational pipeline using Jupyter notebooks. We showcase all pipeline features using two scRNA-seq datasets. The first dataset consists of circulating tumor cells from patients with breast cancer. The second one is a cell cycle regulation dataset in myxoid liposarcoma. All analyses are available as notebooks that integrate in a sequential narrative R code with explanatory text and output data and images. R users can use the notebooks to understand the different steps of the pipeline and will guide them to explore their scRNA-seq data. We also provide a cloud version using Binder that allows the execution of the pipeline without the need of downloading R, Jupyter or any of the packages used by the pipeline. The cloud version can serve as a tutorial for training purposes, especially for those that are not R users or have limited programing skills. However, in order to do meaningful scRNA-seq analyses, all users will need to understand the implemented methods and their possible options and limitations. |
first_indexed | 2024-03-09T03:15:00Z |
format | Article |
id | doaj.art-72ebfc4cec084d099dd309880374d8d8 |
institution | Directory Open Access Journal |
issn | 1661-6596 1422-0067 |
language | English |
last_indexed | 2024-03-09T03:15:00Z |
publishDate | 2021-01-01 |
publisher | MDPI AG |
record_format | Article |
series | International Journal of Molecular Sciences |
spelling | doaj.art-72ebfc4cec084d099dd309880374d8d82023-12-03T15:22:49ZengMDPI AGInternational Journal of Molecular Sciences1661-65961422-00672021-01-01223139910.3390/ijms22031399DIscBIO: A User-Friendly Pipeline for Biomarker Discovery in Single-Cell TranscriptomicsSalim Ghannoum0Waldir Leoncio Netto1Damiano Fantini2Benjamin Ragan-Kelley3Amirabbas Parizadeh4Emma Jonasson5Anders Ståhlberg6Hesso Farhan7Alvaro Köhn-Luque8Department of Molecular Medicine, Institute of Basic Medical Sciences, University of Oslo, 0372 Oslo, NorwayOslo Centre for Biostatistics and Epidemiology, Faculty of Medicine, University of Oslo, 0372 Oslo, NorwayDepartment of Urology, Northwestern University, Chicago, IL 60611, USASimula Research Laboratory, 1325 Lysaker, NorwayDepartment of Molecular Medicine, Institute of Basic Medical Sciences, University of Oslo, 0372 Oslo, NorwaySahlgrenska Center for Cancer Research, Department of Laboratory Medicine, Institute of Biomedicine, Sahlgrenska Academy at University of Gothenburg, SE-41390 Gothenburg, SwedenSahlgrenska Center for Cancer Research, Department of Laboratory Medicine, Institute of Biomedicine, Sahlgrenska Academy at University of Gothenburg, SE-41390 Gothenburg, SwedenDepartment of Molecular Medicine, Institute of Basic Medical Sciences, University of Oslo, 0372 Oslo, NorwayOslo Centre for Biostatistics and Epidemiology, Faculty of Medicine, University of Oslo, 0372 Oslo, NorwayThe growing attention toward the benefits of single-cell RNA sequencing (scRNA-seq) is leading to a myriad of computational packages for the analysis of different aspects of scRNA-seq data. For researchers without advanced programing skills, it is very challenging to combine several packages in order to perform the desired analysis in a simple and reproducible way. Here we present DIscBIO, an open-source, multi-algorithmic pipeline for easy, efficient and reproducible analysis of cellular sub-populations at the transcriptomic level. The pipeline integrates multiple scRNA-seq packages and allows biomarker discovery with decision trees and gene enrichment analysis in a network context using single-cell sequencing read counts through clustering and differential analysis. DIscBIO is freely available as an R package. It can be run either in command-line mode or through a user-friendly computational pipeline using Jupyter notebooks. We showcase all pipeline features using two scRNA-seq datasets. The first dataset consists of circulating tumor cells from patients with breast cancer. The second one is a cell cycle regulation dataset in myxoid liposarcoma. All analyses are available as notebooks that integrate in a sequential narrative R code with explanatory text and output data and images. R users can use the notebooks to understand the different steps of the pipeline and will guide them to explore their scRNA-seq data. We also provide a cloud version using Binder that allows the execution of the pipeline without the need of downloading R, Jupyter or any of the packages used by the pipeline. The cloud version can serve as a tutorial for training purposes, especially for those that are not R users or have limited programing skills. However, in order to do meaningful scRNA-seq analyses, all users will need to understand the implemented methods and their possible options and limitations.https://www.mdpi.com/1422-0067/22/3/1399single-cell sequencingnormalizationgene filteringERCC spike-insbiomarkersDEGs |
spellingShingle | Salim Ghannoum Waldir Leoncio Netto Damiano Fantini Benjamin Ragan-Kelley Amirabbas Parizadeh Emma Jonasson Anders Ståhlberg Hesso Farhan Alvaro Köhn-Luque DIscBIO: A User-Friendly Pipeline for Biomarker Discovery in Single-Cell Transcriptomics International Journal of Molecular Sciences single-cell sequencing normalization gene filtering ERCC spike-ins biomarkers DEGs |
title | DIscBIO: A User-Friendly Pipeline for Biomarker Discovery in Single-Cell Transcriptomics |
title_full | DIscBIO: A User-Friendly Pipeline for Biomarker Discovery in Single-Cell Transcriptomics |
title_fullStr | DIscBIO: A User-Friendly Pipeline for Biomarker Discovery in Single-Cell Transcriptomics |
title_full_unstemmed | DIscBIO: A User-Friendly Pipeline for Biomarker Discovery in Single-Cell Transcriptomics |
title_short | DIscBIO: A User-Friendly Pipeline for Biomarker Discovery in Single-Cell Transcriptomics |
title_sort | discbio a user friendly pipeline for biomarker discovery in single cell transcriptomics |
topic | single-cell sequencing normalization gene filtering ERCC spike-ins biomarkers DEGs |
url | https://www.mdpi.com/1422-0067/22/3/1399 |
work_keys_str_mv | AT salimghannoum discbioauserfriendlypipelineforbiomarkerdiscoveryinsinglecelltranscriptomics AT waldirleoncionetto discbioauserfriendlypipelineforbiomarkerdiscoveryinsinglecelltranscriptomics AT damianofantini discbioauserfriendlypipelineforbiomarkerdiscoveryinsinglecelltranscriptomics AT benjaminragankelley discbioauserfriendlypipelineforbiomarkerdiscoveryinsinglecelltranscriptomics AT amirabbasparizadeh discbioauserfriendlypipelineforbiomarkerdiscoveryinsinglecelltranscriptomics AT emmajonasson discbioauserfriendlypipelineforbiomarkerdiscoveryinsinglecelltranscriptomics AT andersstahlberg discbioauserfriendlypipelineforbiomarkerdiscoveryinsinglecelltranscriptomics AT hessofarhan discbioauserfriendlypipelineforbiomarkerdiscoveryinsinglecelltranscriptomics AT alvarokohnluque discbioauserfriendlypipelineforbiomarkerdiscoveryinsinglecelltranscriptomics |