Barcode identification for single cell genomics
Abstract Background Single-cell sequencing experiments use short DNA barcode ‘tags’ to identify reads that originate from the same cell. In order to recover single-cell information from such experiments, reads must be grouped based on their barcode tag, a crucial processing step that precedes other...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2019-01-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s12859-019-2612-0 |
_version_ | 1818876008210628608 |
---|---|
author | Akshay Tambe Lior Pachter |
author_facet | Akshay Tambe Lior Pachter |
author_sort | Akshay Tambe |
collection | DOAJ |
description | Abstract Background Single-cell sequencing experiments use short DNA barcode ‘tags’ to identify reads that originate from the same cell. In order to recover single-cell information from such experiments, reads must be grouped based on their barcode tag, a crucial processing step that precedes other computations. However, this step can be difficult due to high rates of mismatch and deletion errors that can afflict barcodes. Results Here we present an approach to identify and error-correct barcodes by traversing the de Bruijn graph of circularized barcode k-mers. Our approach is based on the observation that circularizing a barcode sequence can yield error-free k-mers even when the size of k is large relative to the length of the barcode sequence, a regime which is typical single-cell barcoding applications. This allows for assignment of reads to consensus fingerprints constructed from k-mers. Conclusion We show that for single-cell RNA-Seq circularization improves the recovery of accurate single-cell transcriptome estimates, especially when there are a high number of errors per read. This approach is robust to the type of error (mismatch, insertion, deletion), as well as to the relative abundances of the cells. Sircel, a software package that implements this approach is described and publically available. |
first_indexed | 2024-12-19T13:35:33Z |
format | Article |
id | doaj.art-451d82d819a24aa0ab7aa3244846e3f7 |
institution | Directory Open Access Journal |
issn | 1471-2105 |
language | English |
last_indexed | 2024-12-19T13:35:33Z |
publishDate | 2019-01-01 |
publisher | BMC |
record_format | Article |
series | BMC Bioinformatics |
spelling | doaj.art-451d82d819a24aa0ab7aa3244846e3f72022-12-21T20:19:13ZengBMCBMC Bioinformatics1471-21052019-01-012011910.1186/s12859-019-2612-0Barcode identification for single cell genomicsAkshay Tambe0Lior Pachter1Division of Biology and Biological Engineering, California Institute of TechnologyDepartments of Biology and Computing & Mathematical Sciences, California Institute of TechnologyAbstract Background Single-cell sequencing experiments use short DNA barcode ‘tags’ to identify reads that originate from the same cell. In order to recover single-cell information from such experiments, reads must be grouped based on their barcode tag, a crucial processing step that precedes other computations. However, this step can be difficult due to high rates of mismatch and deletion errors that can afflict barcodes. Results Here we present an approach to identify and error-correct barcodes by traversing the de Bruijn graph of circularized barcode k-mers. Our approach is based on the observation that circularizing a barcode sequence can yield error-free k-mers even when the size of k is large relative to the length of the barcode sequence, a regime which is typical single-cell barcoding applications. This allows for assignment of reads to consensus fingerprints constructed from k-mers. Conclusion We show that for single-cell RNA-Seq circularization improves the recovery of accurate single-cell transcriptome estimates, especially when there are a high number of errors per read. This approach is robust to the type of error (mismatch, insertion, deletion), as well as to the relative abundances of the cells. Sircel, a software package that implements this approach is described and publically available.http://link.springer.com/article/10.1186/s12859-019-2612-0Single-cellBarcodesBarcode identificationde Bruijn graphCircularizationK-mer counting |
spellingShingle | Akshay Tambe Lior Pachter Barcode identification for single cell genomics BMC Bioinformatics Single-cell Barcodes Barcode identification de Bruijn graph Circularization K-mer counting |
title | Barcode identification for single cell genomics |
title_full | Barcode identification for single cell genomics |
title_fullStr | Barcode identification for single cell genomics |
title_full_unstemmed | Barcode identification for single cell genomics |
title_short | Barcode identification for single cell genomics |
title_sort | barcode identification for single cell genomics |
topic | Single-cell Barcodes Barcode identification de Bruijn graph Circularization K-mer counting |
url | http://link.springer.com/article/10.1186/s12859-019-2612-0 |
work_keys_str_mv | AT akshaytambe barcodeidentificationforsinglecellgenomics AT liorpachter barcodeidentificationforsinglecellgenomics |