Modeling Persistent Trends in Distributions
© 2018, © 2018 American Statistical Association. We present a nonparametric framework to model a short sequence of probability distributions that vary both due to underlying effects of sequential progression and confounding noise. To distinguish between these two types of variation and estimate the...
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
Informa UK Limited
2021
|
Online Access: | https://hdl.handle.net/1721.1/135779 |
_version_ | 1811094287909126144 |
---|---|
author | Mueller, Jonas Jaakkola, Tommi Gifford, David |
author2 | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory |
author_facet | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Mueller, Jonas Jaakkola, Tommi Gifford, David |
author_sort | Mueller, Jonas |
collection | MIT |
description | © 2018, © 2018 American Statistical Association. We present a nonparametric framework to model a short sequence of probability distributions that vary both due to underlying effects of sequential progression and confounding noise. To distinguish between these two types of variation and estimate the sequential-progression effects, our approach leverages an assumption that these effects follow a persistent trend. This work is motivated by the recent rise of single-cell RNA-sequencing experiments over a brief time course, which aim to identify genes relevant to the progression of a particular biological process across diverse cell populations. While classical statistical tools focus on scalar-response regression or order-agnostic differences between distributions, it is desirable in this setting to consider both the full distributions as well as the structure imposed by their ordering. We introduce a new regression model for ordinal covariates where responses are univariate distributions and the underlying relationship reflects consistent changes in the distributions over increasing levels of the covariate. This concept is formalized as a trend in distributions, which we define as an evolution that is linear under the Wasserstein metric. Implemented via a fast alternating projections algorithm, our method exhibits numerous strengths in simulations and analyses of single-cell gene expression data. Supplementary materials for this article are available online. |
first_indexed | 2024-09-23T15:57:39Z |
format | Article |
id | mit-1721.1/135779 |
institution | Massachusetts Institute of Technology |
language | English |
last_indexed | 2024-09-23T15:57:39Z |
publishDate | 2021 |
publisher | Informa UK Limited |
record_format | dspace |
spelling | mit-1721.1/1357792023-12-22T18:50:26Z Modeling Persistent Trends in Distributions Mueller, Jonas Jaakkola, Tommi Gifford, David Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory © 2018, © 2018 American Statistical Association. We present a nonparametric framework to model a short sequence of probability distributions that vary both due to underlying effects of sequential progression and confounding noise. To distinguish between these two types of variation and estimate the sequential-progression effects, our approach leverages an assumption that these effects follow a persistent trend. This work is motivated by the recent rise of single-cell RNA-sequencing experiments over a brief time course, which aim to identify genes relevant to the progression of a particular biological process across diverse cell populations. While classical statistical tools focus on scalar-response regression or order-agnostic differences between distributions, it is desirable in this setting to consider both the full distributions as well as the structure imposed by their ordering. We introduce a new regression model for ordinal covariates where responses are univariate distributions and the underlying relationship reflects consistent changes in the distributions over increasing levels of the covariate. This concept is formalized as a trend in distributions, which we define as an evolution that is linear under the Wasserstein metric. Implemented via a fast alternating projections algorithm, our method exhibits numerous strengths in simulations and analyses of single-cell gene expression data. Supplementary materials for this article are available online. 2021-10-27T20:29:16Z 2021-10-27T20:29:16Z 2018 2019-05-29T14:28:43Z Article http://purl.org/eprint/type/JournalArticle https://hdl.handle.net/1721.1/135779 en 10.1080/01621459.2017.1341412 Journal of the American Statistical Association Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Informa UK Limited MIT web domain |
spellingShingle | Mueller, Jonas Jaakkola, Tommi Gifford, David Modeling Persistent Trends in Distributions |
title | Modeling Persistent Trends in Distributions |
title_full | Modeling Persistent Trends in Distributions |
title_fullStr | Modeling Persistent Trends in Distributions |
title_full_unstemmed | Modeling Persistent Trends in Distributions |
title_short | Modeling Persistent Trends in Distributions |
title_sort | modeling persistent trends in distributions |
url | https://hdl.handle.net/1721.1/135779 |
work_keys_str_mv | AT muellerjonas modelingpersistenttrendsindistributions AT jaakkolatommi modelingpersistenttrendsindistributions AT gifforddavid modelingpersistenttrendsindistributions |