Matrix prior for data transfer between single cell data types in latent Dirichlet allocation.

Single cell ATAC-seq (scATAC-seq) enables the mapping of regulatory elements in fine-grained cell types. Despite this advance, analysis of the resulting data is challenging, and large scale scATAC-seq data are difficult to obtain and expensive to generate. This motivates a method to leverage informa...

Full description

Bibliographic Details
Main Authors: Alan Min, Timothy Durham, Louis Gevirtzman, William Stafford Noble
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2023-05-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1011049
_version_ 1797818614157410304
author Alan Min
Timothy Durham
Louis Gevirtzman
William Stafford Noble
author_facet Alan Min
Timothy Durham
Louis Gevirtzman
William Stafford Noble
author_sort Alan Min
collection DOAJ
description Single cell ATAC-seq (scATAC-seq) enables the mapping of regulatory elements in fine-grained cell types. Despite this advance, analysis of the resulting data is challenging, and large scale scATAC-seq data are difficult to obtain and expensive to generate. This motivates a method to leverage information from previously generated large scale scATAC-seq or scRNA-seq data to guide our analysis of new scATAC-seq datasets. We analyze scATAC-seq data using latent Dirichlet allocation (LDA), a Bayesian algorithm that was developed to model text corpora, summarizing documents as mixtures of topics defined based on the words that distinguish the documents. When applied to scATAC-seq, LDA treats cells as documents and their accessible sites as words, identifying "topics" based on the cell type-specific accessible sites in those cells. Previous work used uniform symmetric priors in LDA, but we hypothesized that nonuniform matrix priors generated from LDA models trained on existing data sets may enable improved detection of cell types in new data sets, especially if they have relatively few cells. In this work, we test this hypothesis in scATAC-seq data from whole C. elegans nematodes and SHARE-seq data from mouse skin cells. We show that nonsymmetric matrix priors for LDA improve our ability to capture cell type information from small scATAC-seq datasets.
first_indexed 2024-03-13T09:11:33Z
format Article
id doaj.art-f9d6061a0afb4e61be049f68830cdb19
institution Directory Open Access Journal
issn 1553-734X
1553-7358
language English
last_indexed 2024-03-13T09:11:33Z
publishDate 2023-05-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Computational Biology
spelling doaj.art-f9d6061a0afb4e61be049f68830cdb192023-05-27T05:30:49ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582023-05-01195e101104910.1371/journal.pcbi.1011049Matrix prior for data transfer between single cell data types in latent Dirichlet allocation.Alan MinTimothy DurhamLouis GevirtzmanWilliam Stafford NobleSingle cell ATAC-seq (scATAC-seq) enables the mapping of regulatory elements in fine-grained cell types. Despite this advance, analysis of the resulting data is challenging, and large scale scATAC-seq data are difficult to obtain and expensive to generate. This motivates a method to leverage information from previously generated large scale scATAC-seq or scRNA-seq data to guide our analysis of new scATAC-seq datasets. We analyze scATAC-seq data using latent Dirichlet allocation (LDA), a Bayesian algorithm that was developed to model text corpora, summarizing documents as mixtures of topics defined based on the words that distinguish the documents. When applied to scATAC-seq, LDA treats cells as documents and their accessible sites as words, identifying "topics" based on the cell type-specific accessible sites in those cells. Previous work used uniform symmetric priors in LDA, but we hypothesized that nonuniform matrix priors generated from LDA models trained on existing data sets may enable improved detection of cell types in new data sets, especially if they have relatively few cells. In this work, we test this hypothesis in scATAC-seq data from whole C. elegans nematodes and SHARE-seq data from mouse skin cells. We show that nonsymmetric matrix priors for LDA improve our ability to capture cell type information from small scATAC-seq datasets.https://doi.org/10.1371/journal.pcbi.1011049
spellingShingle Alan Min
Timothy Durham
Louis Gevirtzman
William Stafford Noble
Matrix prior for data transfer between single cell data types in latent Dirichlet allocation.
PLoS Computational Biology
title Matrix prior for data transfer between single cell data types in latent Dirichlet allocation.
title_full Matrix prior for data transfer between single cell data types in latent Dirichlet allocation.
title_fullStr Matrix prior for data transfer between single cell data types in latent Dirichlet allocation.
title_full_unstemmed Matrix prior for data transfer between single cell data types in latent Dirichlet allocation.
title_short Matrix prior for data transfer between single cell data types in latent Dirichlet allocation.
title_sort matrix prior for data transfer between single cell data types in latent dirichlet allocation
url https://doi.org/10.1371/journal.pcbi.1011049
work_keys_str_mv AT alanmin matrixpriorfordatatransferbetweensinglecelldatatypesinlatentdirichletallocation
AT timothydurham matrixpriorfordatatransferbetweensinglecelldatatypesinlatentdirichletallocation
AT louisgevirtzman matrixpriorfordatatransferbetweensinglecelldatatypesinlatentdirichletallocation
AT williamstaffordnoble matrixpriorfordatatransferbetweensinglecelldatatypesinlatentdirichletallocation