Modeling Persistent Trends in Distributions

© 2018, © 2018 American Statistical Association. We present a nonparametric framework to model a short sequence of probability distributions that vary both due to underlying effects of sequential progression and confounding noise. To distinguish between these two types of variation and estimate the...

Full description

Bibliographic Details
Main Authors: Mueller, Jonas, Jaakkola, Tommi, Gifford, David
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format: Article
Language:English
Published: Informa UK Limited 2021
Online Access:https://hdl.handle.net/1721.1/135779
_version_ 1811094287909126144
author Mueller, Jonas
Jaakkola, Tommi
Gifford, David
author2 Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Mueller, Jonas
Jaakkola, Tommi
Gifford, David
author_sort Mueller, Jonas
collection MIT
description © 2018, © 2018 American Statistical Association. We present a nonparametric framework to model a short sequence of probability distributions that vary both due to underlying effects of sequential progression and confounding noise. To distinguish between these two types of variation and estimate the sequential-progression effects, our approach leverages an assumption that these effects follow a persistent trend. This work is motivated by the recent rise of single-cell RNA-sequencing experiments over a brief time course, which aim to identify genes relevant to the progression of a particular biological process across diverse cell populations. While classical statistical tools focus on scalar-response regression or order-agnostic differences between distributions, it is desirable in this setting to consider both the full distributions as well as the structure imposed by their ordering. We introduce a new regression model for ordinal covariates where responses are univariate distributions and the underlying relationship reflects consistent changes in the distributions over increasing levels of the covariate. This concept is formalized as a trend in distributions, which we define as an evolution that is linear under the Wasserstein metric. Implemented via a fast alternating projections algorithm, our method exhibits numerous strengths in simulations and analyses of single-cell gene expression data. Supplementary materials for this article are available online.
first_indexed 2024-09-23T15:57:39Z
format Article
id mit-1721.1/135779
institution Massachusetts Institute of Technology
language English
last_indexed 2024-09-23T15:57:39Z
publishDate 2021
publisher Informa UK Limited
record_format dspace
spelling mit-1721.1/1357792023-12-22T18:50:26Z Modeling Persistent Trends in Distributions Mueller, Jonas Jaakkola, Tommi Gifford, David Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory © 2018, © 2018 American Statistical Association. We present a nonparametric framework to model a short sequence of probability distributions that vary both due to underlying effects of sequential progression and confounding noise. To distinguish between these two types of variation and estimate the sequential-progression effects, our approach leverages an assumption that these effects follow a persistent trend. This work is motivated by the recent rise of single-cell RNA-sequencing experiments over a brief time course, which aim to identify genes relevant to the progression of a particular biological process across diverse cell populations. While classical statistical tools focus on scalar-response regression or order-agnostic differences between distributions, it is desirable in this setting to consider both the full distributions as well as the structure imposed by their ordering. We introduce a new regression model for ordinal covariates where responses are univariate distributions and the underlying relationship reflects consistent changes in the distributions over increasing levels of the covariate. This concept is formalized as a trend in distributions, which we define as an evolution that is linear under the Wasserstein metric. Implemented via a fast alternating projections algorithm, our method exhibits numerous strengths in simulations and analyses of single-cell gene expression data. Supplementary materials for this article are available online. 2021-10-27T20:29:16Z 2021-10-27T20:29:16Z 2018 2019-05-29T14:28:43Z Article http://purl.org/eprint/type/JournalArticle https://hdl.handle.net/1721.1/135779 en 10.1080/01621459.2017.1341412 Journal of the American Statistical Association Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Informa UK Limited MIT web domain
spellingShingle Mueller, Jonas
Jaakkola, Tommi
Gifford, David
Modeling Persistent Trends in Distributions
title Modeling Persistent Trends in Distributions
title_full Modeling Persistent Trends in Distributions
title_fullStr Modeling Persistent Trends in Distributions
title_full_unstemmed Modeling Persistent Trends in Distributions
title_short Modeling Persistent Trends in Distributions
title_sort modeling persistent trends in distributions
url https://hdl.handle.net/1721.1/135779
work_keys_str_mv AT muellerjonas modelingpersistenttrendsindistributions
AT jaakkolatommi modelingpersistenttrendsindistributions
AT gifforddavid modelingpersistenttrendsindistributions