Population scale latent space cohort matching for the improved use and exploration of observational trial data

A significant amount of clinical research is observational by nature and derived from medical records, clinical trials, and large-scale registries. While there is no substitute for randomized, controlled experimentation, such experiments or trials are often costly, time consuming, and even ethically...

Full description

Bibliographic Details
Main Authors: Rachel Gologorsky, Sulaiman S. Somani, Sean N. Neifert, Aly A. Valliani, Katherine E. Link, Viola J. Chen, Anthony B. Costa, Eric K. Oermann
Format: Article
Language:English
Published: AIMS Press 2022-05-01
Series:Mathematical Biosciences and Engineering
Subjects:
Online Access:https://www.aimspress.com/article/doi/10.3934/mbe.2022320?viewType=HTML
_version_ 1817991514907738112
author Rachel Gologorsky
Sulaiman S. Somani
Sean N. Neifert
Aly A. Valliani
Katherine E. Link
Viola J. Chen
Anthony B. Costa
Eric K. Oermann
author_facet Rachel Gologorsky
Sulaiman S. Somani
Sean N. Neifert
Aly A. Valliani
Katherine E. Link
Viola J. Chen
Anthony B. Costa
Eric K. Oermann
author_sort Rachel Gologorsky
collection DOAJ
description A significant amount of clinical research is observational by nature and derived from medical records, clinical trials, and large-scale registries. While there is no substitute for randomized, controlled experimentation, such experiments or trials are often costly, time consuming, and even ethically or practically impossible to execute. Combining classical regression and structural equation modeling with matching techniques can leverage the value of observational data. Nevertheless, identifying variables of greatest interest in high-dimensional data is frequently challenging, even with application of classical dimensionality reduction and/or propensity scoring techniques. Here, we demonstrate that projecting high-dimensional medical data onto a lower-dimensional manifold using deep autoencoders and post-hoc generation of treatment/control cohorts based on proximity in the lower-dimensional space results in better matching of confounding variables compared to classical propensity score matching (PSM) in the original high-dimensional space (P<0.0001) and performs similarly to PSM models constructed by experts with prior knowledge of the underlying pathology when evaluated on predicting risk ratios from real-world clinical data. Thus, in cases when the underlying problem is poorly understood and the data is high-dimensional in nature, matching in the autoencoder latent space might be of particular benefit.
first_indexed 2024-04-14T01:14:17Z
format Article
id doaj.art-97d4619039ca4ab281fa8cabe1e8c0c0
institution Directory Open Access Journal
issn 1551-0018
language English
last_indexed 2024-04-14T01:14:17Z
publishDate 2022-05-01
publisher AIMS Press
record_format Article
series Mathematical Biosciences and Engineering
spelling doaj.art-97d4619039ca4ab281fa8cabe1e8c0c02022-12-22T02:20:56ZengAIMS PressMathematical Biosciences and Engineering1551-00182022-05-011976795681310.3934/mbe.2022320Population scale latent space cohort matching for the improved use and exploration of observational trial dataRachel Gologorsky0Sulaiman S. Somani1Sean N. Neifert 2Aly A. Valliani3Katherine E. Link 4Viola J. Chen5Anthony B. Costa 6Eric K. Oermann71. Department of Medicine, Icahn School of Medicine, New York, NY 10028, USA2. Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA3. Department of Neurosurgery, NYU Grossman School of Medicine, New York, NY 10016, USA1. Department of Medicine, Icahn School of Medicine, New York, NY 10028, USA1. Department of Medicine, Icahn School of Medicine, New York, NY 10028, USA4. Oncology Early development, Merck & Co., Inc, Kenilworth, NJ 07033, USA5. NVIDIA, Santa Clara, CA 95051, USA3. Department of Neurosurgery, NYU Grossman School of Medicine, New York, NY 10016, USA6. Department of Radiology, NYU Grossman School of Medicine, New York, NY 10016, USAA significant amount of clinical research is observational by nature and derived from medical records, clinical trials, and large-scale registries. While there is no substitute for randomized, controlled experimentation, such experiments or trials are often costly, time consuming, and even ethically or practically impossible to execute. Combining classical regression and structural equation modeling with matching techniques can leverage the value of observational data. Nevertheless, identifying variables of greatest interest in high-dimensional data is frequently challenging, even with application of classical dimensionality reduction and/or propensity scoring techniques. Here, we demonstrate that projecting high-dimensional medical data onto a lower-dimensional manifold using deep autoencoders and post-hoc generation of treatment/control cohorts based on proximity in the lower-dimensional space results in better matching of confounding variables compared to classical propensity score matching (PSM) in the original high-dimensional space (P<0.0001) and performs similarly to PSM models constructed by experts with prior knowledge of the underlying pathology when evaluated on predicting risk ratios from real-world clinical data. Thus, in cases when the underlying problem is poorly understood and the data is high-dimensional in nature, matching in the autoencoder latent space might be of particular benefit.https://www.aimspress.com/article/doi/10.3934/mbe.2022320?viewType=HTMLartificial intelligenceautoencoderscohort matchingdata visualizationdeep learningmanifold learning
spellingShingle Rachel Gologorsky
Sulaiman S. Somani
Sean N. Neifert
Aly A. Valliani
Katherine E. Link
Viola J. Chen
Anthony B. Costa
Eric K. Oermann
Population scale latent space cohort matching for the improved use and exploration of observational trial data
Mathematical Biosciences and Engineering
artificial intelligence
autoencoders
cohort matching
data visualization
deep learning
manifold learning
title Population scale latent space cohort matching for the improved use and exploration of observational trial data
title_full Population scale latent space cohort matching for the improved use and exploration of observational trial data
title_fullStr Population scale latent space cohort matching for the improved use and exploration of observational trial data
title_full_unstemmed Population scale latent space cohort matching for the improved use and exploration of observational trial data
title_short Population scale latent space cohort matching for the improved use and exploration of observational trial data
title_sort population scale latent space cohort matching for the improved use and exploration of observational trial data
topic artificial intelligence
autoencoders
cohort matching
data visualization
deep learning
manifold learning
url https://www.aimspress.com/article/doi/10.3934/mbe.2022320?viewType=HTML
work_keys_str_mv AT rachelgologorsky populationscalelatentspacecohortmatchingfortheimproveduseandexplorationofobservationaltrialdata
AT sulaimanssomani populationscalelatentspacecohortmatchingfortheimproveduseandexplorationofobservationaltrialdata
AT seannneifert populationscalelatentspacecohortmatchingfortheimproveduseandexplorationofobservationaltrialdata
AT alyavalliani populationscalelatentspacecohortmatchingfortheimproveduseandexplorationofobservationaltrialdata
AT katherineelink populationscalelatentspacecohortmatchingfortheimproveduseandexplorationofobservationaltrialdata
AT violajchen populationscalelatentspacecohortmatchingfortheimproveduseandexplorationofobservationaltrialdata
AT anthonybcosta populationscalelatentspacecohortmatchingfortheimproveduseandexplorationofobservationaltrialdata
AT erickoermann populationscalelatentspacecohortmatchingfortheimproveduseandexplorationofobservationaltrialdata