Population scale latent space cohort matching for the improved use and exploration of observational trial data
A significant amount of clinical research is observational by nature and derived from medical records, clinical trials, and large-scale registries. While there is no substitute for randomized, controlled experimentation, such experiments or trials are often costly, time consuming, and even ethically...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
AIMS Press
2022-05-01
|
Series: | Mathematical Biosciences and Engineering |
Subjects: | |
Online Access: | https://www.aimspress.com/article/doi/10.3934/mbe.2022320?viewType=HTML |
_version_ | 1817991514907738112 |
---|---|
author | Rachel Gologorsky Sulaiman S. Somani Sean N. Neifert Aly A. Valliani Katherine E. Link Viola J. Chen Anthony B. Costa Eric K. Oermann |
author_facet | Rachel Gologorsky Sulaiman S. Somani Sean N. Neifert Aly A. Valliani Katherine E. Link Viola J. Chen Anthony B. Costa Eric K. Oermann |
author_sort | Rachel Gologorsky |
collection | DOAJ |
description | A significant amount of clinical research is observational by nature and derived from medical records, clinical trials, and large-scale registries. While there is no substitute for randomized, controlled experimentation, such experiments or trials are often costly, time consuming, and even ethically or practically impossible to execute. Combining classical regression and structural equation modeling with matching techniques can leverage the value of observational data. Nevertheless, identifying variables of greatest interest in high-dimensional data is frequently challenging, even with application of classical dimensionality reduction and/or propensity scoring techniques. Here, we demonstrate that projecting high-dimensional medical data onto a lower-dimensional manifold using deep autoencoders and post-hoc generation of treatment/control cohorts based on proximity in the lower-dimensional space results in better matching of confounding variables compared to classical propensity score matching (PSM) in the original high-dimensional space (P<0.0001) and performs similarly to PSM models constructed by experts with prior knowledge of the underlying pathology when evaluated on predicting risk ratios from real-world clinical data. Thus, in cases when the underlying problem is poorly understood and the data is high-dimensional in nature, matching in the autoencoder latent space might be of particular benefit. |
first_indexed | 2024-04-14T01:14:17Z |
format | Article |
id | doaj.art-97d4619039ca4ab281fa8cabe1e8c0c0 |
institution | Directory Open Access Journal |
issn | 1551-0018 |
language | English |
last_indexed | 2024-04-14T01:14:17Z |
publishDate | 2022-05-01 |
publisher | AIMS Press |
record_format | Article |
series | Mathematical Biosciences and Engineering |
spelling | doaj.art-97d4619039ca4ab281fa8cabe1e8c0c02022-12-22T02:20:56ZengAIMS PressMathematical Biosciences and Engineering1551-00182022-05-011976795681310.3934/mbe.2022320Population scale latent space cohort matching for the improved use and exploration of observational trial dataRachel Gologorsky0Sulaiman S. Somani1Sean N. Neifert 2Aly A. Valliani3Katherine E. Link 4Viola J. Chen5Anthony B. Costa 6Eric K. Oermann71. Department of Medicine, Icahn School of Medicine, New York, NY 10028, USA2. Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA3. Department of Neurosurgery, NYU Grossman School of Medicine, New York, NY 10016, USA1. Department of Medicine, Icahn School of Medicine, New York, NY 10028, USA1. Department of Medicine, Icahn School of Medicine, New York, NY 10028, USA4. Oncology Early development, Merck & Co., Inc, Kenilworth, NJ 07033, USA5. NVIDIA, Santa Clara, CA 95051, USA3. Department of Neurosurgery, NYU Grossman School of Medicine, New York, NY 10016, USA6. Department of Radiology, NYU Grossman School of Medicine, New York, NY 10016, USAA significant amount of clinical research is observational by nature and derived from medical records, clinical trials, and large-scale registries. While there is no substitute for randomized, controlled experimentation, such experiments or trials are often costly, time consuming, and even ethically or practically impossible to execute. Combining classical regression and structural equation modeling with matching techniques can leverage the value of observational data. Nevertheless, identifying variables of greatest interest in high-dimensional data is frequently challenging, even with application of classical dimensionality reduction and/or propensity scoring techniques. Here, we demonstrate that projecting high-dimensional medical data onto a lower-dimensional manifold using deep autoencoders and post-hoc generation of treatment/control cohorts based on proximity in the lower-dimensional space results in better matching of confounding variables compared to classical propensity score matching (PSM) in the original high-dimensional space (P<0.0001) and performs similarly to PSM models constructed by experts with prior knowledge of the underlying pathology when evaluated on predicting risk ratios from real-world clinical data. Thus, in cases when the underlying problem is poorly understood and the data is high-dimensional in nature, matching in the autoencoder latent space might be of particular benefit.https://www.aimspress.com/article/doi/10.3934/mbe.2022320?viewType=HTMLartificial intelligenceautoencoderscohort matchingdata visualizationdeep learningmanifold learning |
spellingShingle | Rachel Gologorsky Sulaiman S. Somani Sean N. Neifert Aly A. Valliani Katherine E. Link Viola J. Chen Anthony B. Costa Eric K. Oermann Population scale latent space cohort matching for the improved use and exploration of observational trial data Mathematical Biosciences and Engineering artificial intelligence autoencoders cohort matching data visualization deep learning manifold learning |
title | Population scale latent space cohort matching for the improved use and exploration of observational trial data |
title_full | Population scale latent space cohort matching for the improved use and exploration of observational trial data |
title_fullStr | Population scale latent space cohort matching for the improved use and exploration of observational trial data |
title_full_unstemmed | Population scale latent space cohort matching for the improved use and exploration of observational trial data |
title_short | Population scale latent space cohort matching for the improved use and exploration of observational trial data |
title_sort | population scale latent space cohort matching for the improved use and exploration of observational trial data |
topic | artificial intelligence autoencoders cohort matching data visualization deep learning manifold learning |
url | https://www.aimspress.com/article/doi/10.3934/mbe.2022320?viewType=HTML |
work_keys_str_mv | AT rachelgologorsky populationscalelatentspacecohortmatchingfortheimproveduseandexplorationofobservationaltrialdata AT sulaimanssomani populationscalelatentspacecohortmatchingfortheimproveduseandexplorationofobservationaltrialdata AT seannneifert populationscalelatentspacecohortmatchingfortheimproveduseandexplorationofobservationaltrialdata AT alyavalliani populationscalelatentspacecohortmatchingfortheimproveduseandexplorationofobservationaltrialdata AT katherineelink populationscalelatentspacecohortmatchingfortheimproveduseandexplorationofobservationaltrialdata AT violajchen populationscalelatentspacecohortmatchingfortheimproveduseandexplorationofobservationaltrialdata AT anthonybcosta populationscalelatentspacecohortmatchingfortheimproveduseandexplorationofobservationaltrialdata AT erickoermann populationscalelatentspacecohortmatchingfortheimproveduseandexplorationofobservationaltrialdata |