A Calibrated Ensemble Algorithm to Address Data Heterogeneity in Machine Learning: An Application to Identify Severe SLE Flares in Lupus Patients
Motivated to address the inconsistency between the essential i.i.d. assumption in machine learning theory and the data heterogeneity in real-world applications, we propose a novel calibrated ensemble (CE) algorithm to facilitate learning with diverse data subgroups. Unlike the traditional ensemble f...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2022-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9706189/ |
_version_ | 1819277401907003392 |
---|---|
author | Yijun Zhao Man Qin April Jorge |
author_facet | Yijun Zhao Man Qin April Jorge |
author_sort | Yijun Zhao |
collection | DOAJ |
description | Motivated to address the inconsistency between the essential i.i.d. assumption in machine learning theory and the data heterogeneity in real-world applications, we propose a novel calibrated ensemble (CE) algorithm to facilitate learning with diverse data subgroups. Unlike the traditional ensemble framework in which each learner is trained independently using the entire dataset, our method exploits the strengths of various machine learning models by training them simultaneously and forming model-ergonomic data subgroups as part of the training process. Consequently, each learner is calibrated to a unique subset of data based on their individualized predictive strength. Clinically, we can interpret each model as an expert specializing in treating patients with particular disease manifestations. We evaluate the CE model in our motivating domain of identifying lupus patients with severe SLE flares using 1541 clinical encounters in the Mass General Brigham (MGB) Lupus Cohort. Our experimental results demonstrate the efficacy of our CE model across seven performance evaluation metrics compared to five individual machine learning models and regular ensemble approaches. We further utilize ANOVA and Tukey HSD post-hoc statistical analysis to discover characteristic features of individual model clusters for clinical interpretations. |
first_indexed | 2024-12-23T23:55:32Z |
format | Article |
id | doaj.art-00d2dc37b6f746cf999e753e8846637d |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-23T23:55:32Z |
publishDate | 2022-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-00d2dc37b6f746cf999e753e8846637d2022-12-21T17:25:15ZengIEEEIEEE Access2169-35362022-01-0110187201872910.1109/ACCESS.2022.31494779706189A Calibrated Ensemble Algorithm to Address Data Heterogeneity in Machine Learning: An Application to Identify Severe SLE Flares in Lupus PatientsYijun Zhao0https://orcid.org/0000-0003-2424-5988Man Qin1https://orcid.org/0000-0003-4079-1487April Jorge2https://orcid.org/0000-0001-6935-880XDepartment of Computer and Information Sciences, Fordham University, New York, NY, USADepartment of Computer and Information Sciences, Fordham University, New York, NY, USADivision of Rheumatology, Allergy, and Immunology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USAMotivated to address the inconsistency between the essential i.i.d. assumption in machine learning theory and the data heterogeneity in real-world applications, we propose a novel calibrated ensemble (CE) algorithm to facilitate learning with diverse data subgroups. Unlike the traditional ensemble framework in which each learner is trained independently using the entire dataset, our method exploits the strengths of various machine learning models by training them simultaneously and forming model-ergonomic data subgroups as part of the training process. Consequently, each learner is calibrated to a unique subset of data based on their individualized predictive strength. Clinically, we can interpret each model as an expert specializing in treating patients with particular disease manifestations. We evaluate the CE model in our motivating domain of identifying lupus patients with severe SLE flares using 1541 clinical encounters in the Mass General Brigham (MGB) Lupus Cohort. Our experimental results demonstrate the efficacy of our CE model across seven performance evaluation metrics compared to five individual machine learning models and regular ensemble approaches. We further utilize ANOVA and Tukey HSD post-hoc statistical analysis to discover characteristic features of individual model clusters for clinical interpretations.https://ieeexplore.ieee.org/document/9706189/Data heterogeneityensemble learningmachine learninglupusSLE |
spellingShingle | Yijun Zhao Man Qin April Jorge A Calibrated Ensemble Algorithm to Address Data Heterogeneity in Machine Learning: An Application to Identify Severe SLE Flares in Lupus Patients IEEE Access Data heterogeneity ensemble learning machine learning lupus SLE |
title | A Calibrated Ensemble Algorithm to Address Data Heterogeneity in Machine Learning: An Application to Identify Severe SLE Flares in Lupus Patients |
title_full | A Calibrated Ensemble Algorithm to Address Data Heterogeneity in Machine Learning: An Application to Identify Severe SLE Flares in Lupus Patients |
title_fullStr | A Calibrated Ensemble Algorithm to Address Data Heterogeneity in Machine Learning: An Application to Identify Severe SLE Flares in Lupus Patients |
title_full_unstemmed | A Calibrated Ensemble Algorithm to Address Data Heterogeneity in Machine Learning: An Application to Identify Severe SLE Flares in Lupus Patients |
title_short | A Calibrated Ensemble Algorithm to Address Data Heterogeneity in Machine Learning: An Application to Identify Severe SLE Flares in Lupus Patients |
title_sort | calibrated ensemble algorithm to address data heterogeneity in machine learning an application to identify severe sle flares in lupus patients |
topic | Data heterogeneity ensemble learning machine learning lupus SLE |
url | https://ieeexplore.ieee.org/document/9706189/ |
work_keys_str_mv | AT yijunzhao acalibratedensemblealgorithmtoaddressdataheterogeneityinmachinelearninganapplicationtoidentifyseveresleflaresinlupuspatients AT manqin acalibratedensemblealgorithmtoaddressdataheterogeneityinmachinelearninganapplicationtoidentifyseveresleflaresinlupuspatients AT apriljorge acalibratedensemblealgorithmtoaddressdataheterogeneityinmachinelearninganapplicationtoidentifyseveresleflaresinlupuspatients AT yijunzhao calibratedensemblealgorithmtoaddressdataheterogeneityinmachinelearninganapplicationtoidentifyseveresleflaresinlupuspatients AT manqin calibratedensemblealgorithmtoaddressdataheterogeneityinmachinelearninganapplicationtoidentifyseveresleflaresinlupuspatients AT apriljorge calibratedensemblealgorithmtoaddressdataheterogeneityinmachinelearninganapplicationtoidentifyseveresleflaresinlupuspatients |