Correcting nuisance variation using Wasserstein distance

Profiling cellular phenotypes from microscopic imaging can provide meaningful biological information resulting from various factors affecting the cells. One motivating application is drug development: morphological cell features can be captured from images, from which similarities between different...

Full description

Bibliographic Details
Main Authors: Gil Tabak, Minjie Fan, Samuel Yang, Stephan Hoyer, Geoffrey Davis
Format: Article
Language:English
Published: PeerJ Inc. 2020-02-01
Series:PeerJ
Subjects:
Online Access:https://peerj.com/articles/8594.pdf
_version_ 1827605877456109568
author Gil Tabak
Minjie Fan
Samuel Yang
Stephan Hoyer
Geoffrey Davis
author_facet Gil Tabak
Minjie Fan
Samuel Yang
Stephan Hoyer
Geoffrey Davis
author_sort Gil Tabak
collection DOAJ
description Profiling cellular phenotypes from microscopic imaging can provide meaningful biological information resulting from various factors affecting the cells. One motivating application is drug development: morphological cell features can be captured from images, from which similarities between different drug compounds applied at different doses can be quantified. The general approach is to find a function mapping the images to an embedding space of manageable dimensionality whose geometry captures relevant features of the input images. An important known issue for such methods is separating relevant biological signal from nuisance variation. For example, the embedding vectors tend to be more correlated for cells that were cultured and imaged during the same week than for those from different weeks, despite having identical drug compounds applied in both cases. In this case, the particular batch in which a set of experiments were conducted constitutes the domain of the data; an ideal set of image embeddings should contain only the relevant biological information (e.g., drug effects). We develop a general framework for adjusting the image embeddings in order to “forget” domain-specific information while preserving relevant biological information. To achieve this, we minimize a loss function based on distances between marginal distributions (such as the Wasserstein distance) of embeddings across domains for each replicated treatment. For the dataset we present results with, the only replicated treatment happens to be the negative control treatment, for which we do not expect any treatment-induced cell morphology changes. We find that for our transformed embeddings (i) the underlying geometric structure is not only preserved but the embeddings also carry improved biological signal; and (ii) less domain-specific information is present.
first_indexed 2024-03-09T06:29:20Z
format Article
id doaj.art-d6632caac08b430a8bc534fa3dbdf415
institution Directory Open Access Journal
issn 2167-8359
language English
last_indexed 2024-03-09T06:29:20Z
publishDate 2020-02-01
publisher PeerJ Inc.
record_format Article
series PeerJ
spelling doaj.art-d6632caac08b430a8bc534fa3dbdf4152023-12-03T11:10:35ZengPeerJ Inc.PeerJ2167-83592020-02-018e859410.7717/peerj.8594Correcting nuisance variation using Wasserstein distanceGil TabakMinjie FanSamuel YangStephan HoyerGeoffrey DavisProfiling cellular phenotypes from microscopic imaging can provide meaningful biological information resulting from various factors affecting the cells. One motivating application is drug development: morphological cell features can be captured from images, from which similarities between different drug compounds applied at different doses can be quantified. The general approach is to find a function mapping the images to an embedding space of manageable dimensionality whose geometry captures relevant features of the input images. An important known issue for such methods is separating relevant biological signal from nuisance variation. For example, the embedding vectors tend to be more correlated for cells that were cultured and imaged during the same week than for those from different weeks, despite having identical drug compounds applied in both cases. In this case, the particular batch in which a set of experiments were conducted constitutes the domain of the data; an ideal set of image embeddings should contain only the relevant biological information (e.g., drug effects). We develop a general framework for adjusting the image embeddings in order to “forget” domain-specific information while preserving relevant biological information. To achieve this, we minimize a loss function based on distances between marginal distributions (such as the Wasserstein distance) of embeddings across domains for each replicated treatment. For the dataset we present results with, the only replicated treatment happens to be the negative control treatment, for which we do not expect any treatment-induced cell morphology changes. We find that for our transformed embeddings (i) the underlying geometric structure is not only preserved but the embeddings also carry improved biological signal; and (ii) less domain-specific information is present.https://peerj.com/articles/8594.pdfWasserstein distanceCellular phenotypingBatch effectEmbeddingMinimaxOptimal transport
spellingShingle Gil Tabak
Minjie Fan
Samuel Yang
Stephan Hoyer
Geoffrey Davis
Correcting nuisance variation using Wasserstein distance
PeerJ
Wasserstein distance
Cellular phenotyping
Batch effect
Embedding
Minimax
Optimal transport
title Correcting nuisance variation using Wasserstein distance
title_full Correcting nuisance variation using Wasserstein distance
title_fullStr Correcting nuisance variation using Wasserstein distance
title_full_unstemmed Correcting nuisance variation using Wasserstein distance
title_short Correcting nuisance variation using Wasserstein distance
title_sort correcting nuisance variation using wasserstein distance
topic Wasserstein distance
Cellular phenotyping
Batch effect
Embedding
Minimax
Optimal transport
url https://peerj.com/articles/8594.pdf
work_keys_str_mv AT giltabak correctingnuisancevariationusingwassersteindistance
AT minjiefan correctingnuisancevariationusingwassersteindistance
AT samuelyang correctingnuisancevariationusingwassersteindistance
AT stephanhoyer correctingnuisancevariationusingwassersteindistance
AT geoffreydavis correctingnuisancevariationusingwassersteindistance