Survival analysis of patient groups defined by unsupervised machine learning clustering methods based on patient metabolomic data.

Purpose: Meta-analyses failed to accurately identify patients with non-metastatic breast cancer who are likely to benefit from chemotherapy, and metabolomics could provide new answers. In our previous published work, patients were clustered using five different unsupervised machine learning (ML) met...

Full description

Bibliographic Details
Main Authors: Caroline Bailleux, David Chardin, Jean-Marie Guigonis, Jean-Marc Ferrero, Yann Chateau, Olivier Humbert, Thierry Pourcher, Jocelyn Gal
Format: Article
Language:English
Published: Elsevier 2023-01-01
Series:Computational and Structural Biotechnology Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2001037023003884
_version_ 1797384073041149952
author Caroline Bailleux
David Chardin
Jean-Marie Guigonis
Jean-Marc Ferrero
Yann Chateau
Olivier Humbert
Thierry Pourcher
Jocelyn Gal
author_facet Caroline Bailleux
David Chardin
Jean-Marie Guigonis
Jean-Marc Ferrero
Yann Chateau
Olivier Humbert
Thierry Pourcher
Jocelyn Gal
author_sort Caroline Bailleux
collection DOAJ
description Purpose: Meta-analyses failed to accurately identify patients with non-metastatic breast cancer who are likely to benefit from chemotherapy, and metabolomics could provide new answers. In our previous published work, patients were clustered using five different unsupervised machine learning (ML) methods resulting in the identification of three clusters with distinct clinical and simulated survival data. The objective of this study was to evaluate the survival outcomes, with extended follow-up, using the same 5 different methods of unsupervised machine learning. Experimental design: Forty-nine patients, diagnosed between 2013 and 2016, with non-metastatic BC were included retrospectively. Median follow-up was extended to 85.8 months. 449 metabolites were extracted from tumor resection samples by combined Liquid chromatography-mass spectrometry (LC–MS). Survival analyses were reported grouping together Cluster 1 and 2 versus cluster 3. Bootstrap optimization was applied. Results: PCA k-means, K-sparse and Spectral clustering were the most effective methods to predict 2-year progression-free survival with bootstrap optimization (PFSb); as bootstrap example, with PCA k-means method, PFSb were 94% for cluster 1&2 versus 82% for cluster 3 (p = 0.01). PCA k-means method performed best, with higher reproducibility (mean HR=2 (95%CI [1.4–2.7]); probability of p ≤ 0.05 85%). Cancer-specific survival (CSS) and overall survival (OS) analyses highlighted a discrepancy between the 5 ML unsupervised methods. Conclusion: Our study is a proof-of-principle that it is possible to use unsupervised ML methods on metabolomic data to predict PFS survival outcomes, with the best performance for PCA k-means. A larger population study is needed to draw conclusions from CSS and OS analyses.
first_indexed 2024-03-08T21:30:13Z
format Article
id doaj.art-816f939c016a42129efa98d1890845fc
institution Directory Open Access Journal
issn 2001-0370
language English
last_indexed 2024-03-08T21:30:13Z
publishDate 2023-01-01
publisher Elsevier
record_format Article
series Computational and Structural Biotechnology Journal
spelling doaj.art-816f939c016a42129efa98d1890845fc2023-12-21T07:32:22ZengElsevierComputational and Structural Biotechnology Journal2001-03702023-01-012151365143Survival analysis of patient groups defined by unsupervised machine learning clustering methods based on patient metabolomic data.Caroline Bailleux0David Chardin1Jean-Marie Guigonis2Jean-Marc Ferrero3Yann Chateau4Olivier Humbert5Thierry Pourcher6Jocelyn Gal7University Côte d′Azur, Centre Antoine Lacassagne, Medical Oncology Department, Nice F-06189, France; University Côte d′Azur, Commissariat à l′Energie Atomique et aux énergies alternatives, Institut Frédéric Joliot, Service Hospitalier Frédéric Joliot, laboratory Transporters in Oncology and Radiotherapy in Oncology (TIRO), School of medicine, Nice F-06100, France; Correspondence to: Medical Oncology Department, Centre Antoine Lacassagne, University Côte d′Azur, 33 avenue de Valombrose, 06189 Nice, France.University Côte d′Azur, Centre Antoine Lacassagne, Nuclear medicine Department, Nice F-06189, France; University Côte d′Azur, Commissariat à l′Energie Atomique et aux énergies alternatives, Institut Frédéric Joliot, Service Hospitalier Frédéric Joliot, laboratory Transporters in Oncology and Radiotherapy in Oncology (TIRO), School of medicine, Nice F-06100, FranceUniversity Côte d′Azur, Commissariat à l′Energie Atomique et aux énergies alternatives, Institut Frédéric Joliot, Service Hospitalier Frédéric Joliot, laboratory Transporters in Oncology and Radiotherapy in Oncology (TIRO), School of medicine, Nice F-06100, FranceUniversity Côte d′Azur, Centre Antoine Lacassagne, Medical Oncology Department, Nice F-06189, FranceUniversity Côte d′Azur, Centre Antoine Lacassagne, Epidemiology and Biostatistics Department, Nice F-06189, FranceUniversity Côte d′Azur, Centre Antoine Lacassagne, Nuclear medicine Department, Nice F-06189, France; University Côte d′Azur, Commissariat à l′Energie Atomique et aux énergies alternatives, Institut Frédéric Joliot, Service Hospitalier Frédéric Joliot, laboratory Transporters in Oncology and Radiotherapy in Oncology (TIRO), School of medicine, Nice F-06100, FranceUniversity Côte d′Azur, Commissariat à l′Energie Atomique et aux énergies alternatives, Institut Frédéric Joliot, Service Hospitalier Frédéric Joliot, laboratory Transporters in Oncology and Radiotherapy in Oncology (TIRO), School of medicine, Nice F-06100, FranceUniversity Côte d′Azur, Centre Antoine Lacassagne, Epidemiology and Biostatistics Department, Nice F-06189, FrancePurpose: Meta-analyses failed to accurately identify patients with non-metastatic breast cancer who are likely to benefit from chemotherapy, and metabolomics could provide new answers. In our previous published work, patients were clustered using five different unsupervised machine learning (ML) methods resulting in the identification of three clusters with distinct clinical and simulated survival data. The objective of this study was to evaluate the survival outcomes, with extended follow-up, using the same 5 different methods of unsupervised machine learning. Experimental design: Forty-nine patients, diagnosed between 2013 and 2016, with non-metastatic BC were included retrospectively. Median follow-up was extended to 85.8 months. 449 metabolites were extracted from tumor resection samples by combined Liquid chromatography-mass spectrometry (LC–MS). Survival analyses were reported grouping together Cluster 1 and 2 versus cluster 3. Bootstrap optimization was applied. Results: PCA k-means, K-sparse and Spectral clustering were the most effective methods to predict 2-year progression-free survival with bootstrap optimization (PFSb); as bootstrap example, with PCA k-means method, PFSb were 94% for cluster 1&2 versus 82% for cluster 3 (p = 0.01). PCA k-means method performed best, with higher reproducibility (mean HR=2 (95%CI [1.4–2.7]); probability of p ≤ 0.05 85%). Cancer-specific survival (CSS) and overall survival (OS) analyses highlighted a discrepancy between the 5 ML unsupervised methods. Conclusion: Our study is a proof-of-principle that it is possible to use unsupervised ML methods on metabolomic data to predict PFS survival outcomes, with the best performance for PCA k-means. A larger population study is needed to draw conclusions from CSS and OS analyses.http://www.sciencedirect.com/science/article/pii/S2001037023003884Unsupervised machine learningClusteringBreast cancerSurvivalUntargetedProof-of-concept
spellingShingle Caroline Bailleux
David Chardin
Jean-Marie Guigonis
Jean-Marc Ferrero
Yann Chateau
Olivier Humbert
Thierry Pourcher
Jocelyn Gal
Survival analysis of patient groups defined by unsupervised machine learning clustering methods based on patient metabolomic data.
Computational and Structural Biotechnology Journal
Unsupervised machine learning
Clustering
Breast cancer
Survival
Untargeted
Proof-of-concept
title Survival analysis of patient groups defined by unsupervised machine learning clustering methods based on patient metabolomic data.
title_full Survival analysis of patient groups defined by unsupervised machine learning clustering methods based on patient metabolomic data.
title_fullStr Survival analysis of patient groups defined by unsupervised machine learning clustering methods based on patient metabolomic data.
title_full_unstemmed Survival analysis of patient groups defined by unsupervised machine learning clustering methods based on patient metabolomic data.
title_short Survival analysis of patient groups defined by unsupervised machine learning clustering methods based on patient metabolomic data.
title_sort survival analysis of patient groups defined by unsupervised machine learning clustering methods based on patient metabolomic data
topic Unsupervised machine learning
Clustering
Breast cancer
Survival
Untargeted
Proof-of-concept
url http://www.sciencedirect.com/science/article/pii/S2001037023003884
work_keys_str_mv AT carolinebailleux survivalanalysisofpatientgroupsdefinedbyunsupervisedmachinelearningclusteringmethodsbasedonpatientmetabolomicdata
AT davidchardin survivalanalysisofpatientgroupsdefinedbyunsupervisedmachinelearningclusteringmethodsbasedonpatientmetabolomicdata
AT jeanmarieguigonis survivalanalysisofpatientgroupsdefinedbyunsupervisedmachinelearningclusteringmethodsbasedonpatientmetabolomicdata
AT jeanmarcferrero survivalanalysisofpatientgroupsdefinedbyunsupervisedmachinelearningclusteringmethodsbasedonpatientmetabolomicdata
AT yannchateau survivalanalysisofpatientgroupsdefinedbyunsupervisedmachinelearningclusteringmethodsbasedonpatientmetabolomicdata
AT olivierhumbert survivalanalysisofpatientgroupsdefinedbyunsupervisedmachinelearningclusteringmethodsbasedonpatientmetabolomicdata
AT thierrypourcher survivalanalysisofpatientgroupsdefinedbyunsupervisedmachinelearningclusteringmethodsbasedonpatientmetabolomicdata
AT jocelyngal survivalanalysisofpatientgroupsdefinedbyunsupervisedmachinelearningclusteringmethodsbasedonpatientmetabolomicdata