Supervised Parametric Learning in the Identification of Composite Biomarker Signatures of Type 1 Diabetes in Integrated Parallel Multi-Omics Datasets

Background: Type 1 diabetes (T1D) is a devastating autoimmune disease, and its rising prevalence in the United States and around the world presents a critical problem in public health. While some treatment options exist for patients already diagnosed, individuals considered at risk for developing T1...

Full description

Bibliographic Details
Main Authors: Jerry Bonnell, Oscar Alcazar, Brandon Watts, Peter Buchwald, Midhat H. Abdulreda, Mitsunori Ogihara
Format: Article
Language:English
Published: MDPI AG 2024-02-01
Series:Biomedicines
Subjects:
Online Access:https://www.mdpi.com/2227-9059/12/3/492
_version_ 1797242041718013952
author Jerry Bonnell
Oscar Alcazar
Brandon Watts
Peter Buchwald
Midhat H. Abdulreda
Mitsunori Ogihara
author_facet Jerry Bonnell
Oscar Alcazar
Brandon Watts
Peter Buchwald
Midhat H. Abdulreda
Mitsunori Ogihara
author_sort Jerry Bonnell
collection DOAJ
description Background: Type 1 diabetes (T1D) is a devastating autoimmune disease, and its rising prevalence in the United States and around the world presents a critical problem in public health. While some treatment options exist for patients already diagnosed, individuals considered at risk for developing T1D and who are still in the early stages of their disease pathogenesis without symptoms have no options for any preventive intervention. This is because of the uncertainty in determining their risk level and in predicting with high confidence who will progress, or not, to clinical diagnosis. Biomarkers that assess one’s risk with high certainty could address this problem and will inform decisions on early intervention, especially in children where the burden of justifying treatment is high. Single omics approaches (e.g., genomics, proteomics, metabolomics, etc.) have been applied to identify T1D biomarkers based on specific disturbances in association with the disease. However, reliable early biomarkers of T1D have remained elusive to date. To overcome this, we previously showed that parallel multi-omics provides a more comprehensive picture of the disease-associated disturbances and facilitates the identification of candidate T1D biomarkers. Methods: This paper evaluated the use of machine learning (ML) using data augmentation and supervised ML methods for the purpose of improving the identification of salient patterns in the data and the ultimate extraction of novel biomarker candidates in integrated parallel multi-omics datasets from a limited number of samples. We also examined different stages of data integration (early, intermediate, and late) to assess at which stage supervised parametric models can learn under conditions of high dimensionality and variation in feature counts across different omics. In the late integration scheme, we employed a multi-view ensemble comprising individual parametric models trained over single omics to address the computational challenges posed by the high dimensionality and variation in feature counts across the different yet integrated multi-omics datasets. Results: the multi-view ensemble improves the prediction of case vs. control and finds the most success in flagging a larger consistent set of associated features when compared with chance models, which may eventually be used downstream in identifying a novel composite biomarker signature of T1D risk. Conclusions: the current work demonstrates the utility of supervised ML in exploring integrated parallel multi-omics data in the ongoing quest for early T1D biomarkers, reinforcing the hope for identifying novel composite biomarker signatures of T1D risk via ML and ultimately informing early treatment decisions in the face of the escalating global incidence of this debilitating disease.
first_indexed 2024-04-24T18:32:55Z
format Article
id doaj.art-ffa67aaa54fc4a30a58171bdc8629ce7
institution Directory Open Access Journal
issn 2227-9059
language English
last_indexed 2024-04-24T18:32:55Z
publishDate 2024-02-01
publisher MDPI AG
record_format Article
series Biomedicines
spelling doaj.art-ffa67aaa54fc4a30a58171bdc8629ce72024-03-27T13:22:31ZengMDPI AGBiomedicines2227-90592024-02-0112349210.3390/biomedicines12030492Supervised Parametric Learning in the Identification of Composite Biomarker Signatures of Type 1 Diabetes in Integrated Parallel Multi-Omics DatasetsJerry Bonnell0Oscar Alcazar1Brandon Watts2Peter Buchwald3Midhat H. Abdulreda4Mitsunori Ogihara5Frost Institute for Data Science and Computing, University of Miami, Coral Gables, FL 33146, USADiabetes Research Institute, Miller School of Medicine, University of Miami, Miami, FL 33136, USADiabetes Research Institute, Miller School of Medicine, University of Miami, Miami, FL 33136, USADiabetes Research Institute, Miller School of Medicine, University of Miami, Miami, FL 33136, USADiabetes Research Institute, Miller School of Medicine, University of Miami, Miami, FL 33136, USAFrost Institute for Data Science and Computing, University of Miami, Coral Gables, FL 33146, USABackground: Type 1 diabetes (T1D) is a devastating autoimmune disease, and its rising prevalence in the United States and around the world presents a critical problem in public health. While some treatment options exist for patients already diagnosed, individuals considered at risk for developing T1D and who are still in the early stages of their disease pathogenesis without symptoms have no options for any preventive intervention. This is because of the uncertainty in determining their risk level and in predicting with high confidence who will progress, or not, to clinical diagnosis. Biomarkers that assess one’s risk with high certainty could address this problem and will inform decisions on early intervention, especially in children where the burden of justifying treatment is high. Single omics approaches (e.g., genomics, proteomics, metabolomics, etc.) have been applied to identify T1D biomarkers based on specific disturbances in association with the disease. However, reliable early biomarkers of T1D have remained elusive to date. To overcome this, we previously showed that parallel multi-omics provides a more comprehensive picture of the disease-associated disturbances and facilitates the identification of candidate T1D biomarkers. Methods: This paper evaluated the use of machine learning (ML) using data augmentation and supervised ML methods for the purpose of improving the identification of salient patterns in the data and the ultimate extraction of novel biomarker candidates in integrated parallel multi-omics datasets from a limited number of samples. We also examined different stages of data integration (early, intermediate, and late) to assess at which stage supervised parametric models can learn under conditions of high dimensionality and variation in feature counts across different omics. In the late integration scheme, we employed a multi-view ensemble comprising individual parametric models trained over single omics to address the computational challenges posed by the high dimensionality and variation in feature counts across the different yet integrated multi-omics datasets. Results: the multi-view ensemble improves the prediction of case vs. control and finds the most success in flagging a larger consistent set of associated features when compared with chance models, which may eventually be used downstream in identifying a novel composite biomarker signature of T1D risk. Conclusions: the current work demonstrates the utility of supervised ML in exploring integrated parallel multi-omics data in the ongoing quest for early T1D biomarkers, reinforcing the hope for identifying novel composite biomarker signatures of T1D risk via ML and ultimately informing early treatment decisions in the face of the escalating global incidence of this debilitating disease.https://www.mdpi.com/2227-9059/12/3/492biomarker signaturesearly diagnosisintegrated analysislipidomicsmachine learning (ML)multi-view architecture
spellingShingle Jerry Bonnell
Oscar Alcazar
Brandon Watts
Peter Buchwald
Midhat H. Abdulreda
Mitsunori Ogihara
Supervised Parametric Learning in the Identification of Composite Biomarker Signatures of Type 1 Diabetes in Integrated Parallel Multi-Omics Datasets
Biomedicines
biomarker signatures
early diagnosis
integrated analysis
lipidomics
machine learning (ML)
multi-view architecture
title Supervised Parametric Learning in the Identification of Composite Biomarker Signatures of Type 1 Diabetes in Integrated Parallel Multi-Omics Datasets
title_full Supervised Parametric Learning in the Identification of Composite Biomarker Signatures of Type 1 Diabetes in Integrated Parallel Multi-Omics Datasets
title_fullStr Supervised Parametric Learning in the Identification of Composite Biomarker Signatures of Type 1 Diabetes in Integrated Parallel Multi-Omics Datasets
title_full_unstemmed Supervised Parametric Learning in the Identification of Composite Biomarker Signatures of Type 1 Diabetes in Integrated Parallel Multi-Omics Datasets
title_short Supervised Parametric Learning in the Identification of Composite Biomarker Signatures of Type 1 Diabetes in Integrated Parallel Multi-Omics Datasets
title_sort supervised parametric learning in the identification of composite biomarker signatures of type 1 diabetes in integrated parallel multi omics datasets
topic biomarker signatures
early diagnosis
integrated analysis
lipidomics
machine learning (ML)
multi-view architecture
url https://www.mdpi.com/2227-9059/12/3/492
work_keys_str_mv AT jerrybonnell supervisedparametriclearningintheidentificationofcompositebiomarkersignaturesoftype1diabetesinintegratedparallelmultiomicsdatasets
AT oscaralcazar supervisedparametriclearningintheidentificationofcompositebiomarkersignaturesoftype1diabetesinintegratedparallelmultiomicsdatasets
AT brandonwatts supervisedparametriclearningintheidentificationofcompositebiomarkersignaturesoftype1diabetesinintegratedparallelmultiomicsdatasets
AT peterbuchwald supervisedparametriclearningintheidentificationofcompositebiomarkersignaturesoftype1diabetesinintegratedparallelmultiomicsdatasets
AT midhathabdulreda supervisedparametriclearningintheidentificationofcompositebiomarkersignaturesoftype1diabetesinintegratedparallelmultiomicsdatasets
AT mitsunoriogihara supervisedparametriclearningintheidentificationofcompositebiomarkersignaturesoftype1diabetesinintegratedparallelmultiomicsdatasets