Mixture of Regressions with Multivariate Responses for Discovering Subtypes in Alzheimer’s Biomarkers with Detection Limits

AbstractThere is no gold standard for the diagnosis of Alzheimer’s disease (AD), except for autopsies, which motivates the use of unsupervised learning. A mixture of regressions is an unsupervised method that can simultaneously identify clusters from multiple biomarkers while learning within-cluster...

Full description

Bibliographic Details
Main Authors: Ganzhong Tian, John Hanfelt, James Lah, Benjamin B. Risk
Format: Article
Language:English
Published: Taylor & Francis Group 2024-12-01
Series:Data Science in Science
Subjects:
Online Access:https://www.tandfonline.com/doi/10.1080/26941899.2024.2309403
_version_ 1797272085494497280
author Ganzhong Tian
John Hanfelt
James Lah
Benjamin B. Risk
author_facet Ganzhong Tian
John Hanfelt
James Lah
Benjamin B. Risk
author_sort Ganzhong Tian
collection DOAJ
description AbstractThere is no gold standard for the diagnosis of Alzheimer’s disease (AD), except for autopsies, which motivates the use of unsupervised learning. A mixture of regressions is an unsupervised method that can simultaneously identify clusters from multiple biomarkers while learning within-cluster demographic effects. Cerebrospinal fluid (CSF) biomarkers for AD have detection limits, which create additional challenges. We apply a mixture of regressions with a multivariate truncated Gaussian distribution (also called a censored multivariate Gaussian mixture of regressions or a mixture of multivariate Tobit regressions) to over 3000 participants from the Emory Goizueta Alzheimer’s Disease Research Center and Emory Healthy Brain Study to examine amyloid-beta peptide 1–42 (Abeta42), total tau protein and phosphorylated tau protein in CSF with known detection limits. We address three gaps in the literature on the mixture of regressions with a truncated multivariate Gaussian distribution: software availability; inference; and clustering accuracy. We discovered three clusters that tend to align with an AD group, a normal control profile, and non-AD pathology. The CSF profiles differed by race, gender, and the genetic marker ApoE4, highlighting the importance of considering demographic factors in unsupervised learning with detection limits. Notably, African American participants in the AD-like group had significantly lower tau burden.
first_indexed 2024-03-07T14:20:33Z
format Article
id doaj.art-8ceb8c67667f4990a25db14b0e379c3a
institution Directory Open Access Journal
issn 2694-1899
language English
last_indexed 2024-03-07T14:20:33Z
publishDate 2024-12-01
publisher Taylor & Francis Group
record_format Article
series Data Science in Science
spelling doaj.art-8ceb8c67667f4990a25db14b0e379c3a2024-03-06T09:17:11ZengTaylor & Francis GroupData Science in Science2694-18992024-12-013110.1080/26941899.2024.2309403Mixture of Regressions with Multivariate Responses for Discovering Subtypes in Alzheimer’s Biomarkers with Detection LimitsGanzhong Tian0John Hanfelt1James Lah2Benjamin B. Risk3Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia, USADepartment of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia, USADepartment of Neurology, Emory University School of Medicine, Atlanta, Georgia, USADepartment of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia, USAAbstractThere is no gold standard for the diagnosis of Alzheimer’s disease (AD), except for autopsies, which motivates the use of unsupervised learning. A mixture of regressions is an unsupervised method that can simultaneously identify clusters from multiple biomarkers while learning within-cluster demographic effects. Cerebrospinal fluid (CSF) biomarkers for AD have detection limits, which create additional challenges. We apply a mixture of regressions with a multivariate truncated Gaussian distribution (also called a censored multivariate Gaussian mixture of regressions or a mixture of multivariate Tobit regressions) to over 3000 participants from the Emory Goizueta Alzheimer’s Disease Research Center and Emory Healthy Brain Study to examine amyloid-beta peptide 1–42 (Abeta42), total tau protein and phosphorylated tau protein in CSF with known detection limits. We address three gaps in the literature on the mixture of regressions with a truncated multivariate Gaussian distribution: software availability; inference; and clustering accuracy. We discovered three clusters that tend to align with an AD group, a normal control profile, and non-AD pathology. The CSF profiles differed by race, gender, and the genetic marker ApoE4, highlighting the importance of considering demographic factors in unsupervised learning with detection limits. Notably, African American participants in the AD-like group had significantly lower tau burden.https://www.tandfonline.com/doi/10.1080/26941899.2024.2309403Alzheimer’s diseasecensored Gaussian mixture of regressionsclusteringfinite mixture modellatent class analysisTobit model
spellingShingle Ganzhong Tian
John Hanfelt
James Lah
Benjamin B. Risk
Mixture of Regressions with Multivariate Responses for Discovering Subtypes in Alzheimer’s Biomarkers with Detection Limits
Data Science in Science
Alzheimer’s disease
censored Gaussian mixture of regressions
clustering
finite mixture model
latent class analysis
Tobit model
title Mixture of Regressions with Multivariate Responses for Discovering Subtypes in Alzheimer’s Biomarkers with Detection Limits
title_full Mixture of Regressions with Multivariate Responses for Discovering Subtypes in Alzheimer’s Biomarkers with Detection Limits
title_fullStr Mixture of Regressions with Multivariate Responses for Discovering Subtypes in Alzheimer’s Biomarkers with Detection Limits
title_full_unstemmed Mixture of Regressions with Multivariate Responses for Discovering Subtypes in Alzheimer’s Biomarkers with Detection Limits
title_short Mixture of Regressions with Multivariate Responses for Discovering Subtypes in Alzheimer’s Biomarkers with Detection Limits
title_sort mixture of regressions with multivariate responses for discovering subtypes in alzheimer s biomarkers with detection limits
topic Alzheimer’s disease
censored Gaussian mixture of regressions
clustering
finite mixture model
latent class analysis
Tobit model
url https://www.tandfonline.com/doi/10.1080/26941899.2024.2309403
work_keys_str_mv AT ganzhongtian mixtureofregressionswithmultivariateresponsesfordiscoveringsubtypesinalzheimersbiomarkerswithdetectionlimits
AT johnhanfelt mixtureofregressionswithmultivariateresponsesfordiscoveringsubtypesinalzheimersbiomarkerswithdetectionlimits
AT jameslah mixtureofregressionswithmultivariateresponsesfordiscoveringsubtypesinalzheimersbiomarkerswithdetectionlimits
AT benjaminbrisk mixtureofregressionswithmultivariateresponsesfordiscoveringsubtypesinalzheimersbiomarkerswithdetectionlimits