Barcode identification for single cell genomics

Abstract Background Single-cell sequencing experiments use short DNA barcode ‘tags’ to identify reads that originate from the same cell. In order to recover single-cell information from such experiments, reads must be grouped based on their barcode tag, a crucial processing step that precedes other...

Full description

Bibliographic Details
Main Authors: Akshay Tambe, Lior Pachter
Format: Article
Language:English
Published: BMC 2019-01-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-019-2612-0
_version_ 1818876008210628608
author Akshay Tambe
Lior Pachter
author_facet Akshay Tambe
Lior Pachter
author_sort Akshay Tambe
collection DOAJ
description Abstract Background Single-cell sequencing experiments use short DNA barcode ‘tags’ to identify reads that originate from the same cell. In order to recover single-cell information from such experiments, reads must be grouped based on their barcode tag, a crucial processing step that precedes other computations. However, this step can be difficult due to high rates of mismatch and deletion errors that can afflict barcodes. Results Here we present an approach to identify and error-correct barcodes by traversing the de Bruijn graph of circularized barcode k-mers. Our approach is based on the observation that circularizing a barcode sequence can yield error-free k-mers even when the size of k is large relative to the length of the barcode sequence, a regime which is typical single-cell barcoding applications. This allows for assignment of reads to consensus fingerprints constructed from k-mers. Conclusion We show that for single-cell RNA-Seq circularization improves the recovery of accurate single-cell transcriptome estimates, especially when there are a high number of errors per read. This approach is robust to the type of error (mismatch, insertion, deletion), as well as to the relative abundances of the cells. Sircel, a software package that implements this approach is described and publically available.
first_indexed 2024-12-19T13:35:33Z
format Article
id doaj.art-451d82d819a24aa0ab7aa3244846e3f7
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-19T13:35:33Z
publishDate 2019-01-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-451d82d819a24aa0ab7aa3244846e3f72022-12-21T20:19:13ZengBMCBMC Bioinformatics1471-21052019-01-012011910.1186/s12859-019-2612-0Barcode identification for single cell genomicsAkshay Tambe0Lior Pachter1Division of Biology and Biological Engineering, California Institute of TechnologyDepartments of Biology and Computing & Mathematical Sciences, California Institute of TechnologyAbstract Background Single-cell sequencing experiments use short DNA barcode ‘tags’ to identify reads that originate from the same cell. In order to recover single-cell information from such experiments, reads must be grouped based on their barcode tag, a crucial processing step that precedes other computations. However, this step can be difficult due to high rates of mismatch and deletion errors that can afflict barcodes. Results Here we present an approach to identify and error-correct barcodes by traversing the de Bruijn graph of circularized barcode k-mers. Our approach is based on the observation that circularizing a barcode sequence can yield error-free k-mers even when the size of k is large relative to the length of the barcode sequence, a regime which is typical single-cell barcoding applications. This allows for assignment of reads to consensus fingerprints constructed from k-mers. Conclusion We show that for single-cell RNA-Seq circularization improves the recovery of accurate single-cell transcriptome estimates, especially when there are a high number of errors per read. This approach is robust to the type of error (mismatch, insertion, deletion), as well as to the relative abundances of the cells. Sircel, a software package that implements this approach is described and publically available.http://link.springer.com/article/10.1186/s12859-019-2612-0Single-cellBarcodesBarcode identificationde Bruijn graphCircularizationK-mer counting
spellingShingle Akshay Tambe
Lior Pachter
Barcode identification for single cell genomics
BMC Bioinformatics
Single-cell
Barcodes
Barcode identification
de Bruijn graph
Circularization
K-mer counting
title Barcode identification for single cell genomics
title_full Barcode identification for single cell genomics
title_fullStr Barcode identification for single cell genomics
title_full_unstemmed Barcode identification for single cell genomics
title_short Barcode identification for single cell genomics
title_sort barcode identification for single cell genomics
topic Single-cell
Barcodes
Barcode identification
de Bruijn graph
Circularization
K-mer counting
url http://link.springer.com/article/10.1186/s12859-019-2612-0
work_keys_str_mv AT akshaytambe barcodeidentificationforsinglecellgenomics
AT liorpachter barcodeidentificationforsinglecellgenomics