Features and explainable methods for cytokines analysis of Dry Eye Disease in HIV infected patients

Clinical Decision Support Systems (CDSS) that use machine learning techniques and their broadest sense of artificial intelligence (AI) must be interpretable and transparent. The lack of transparency instead of providing support could instead become a factor of indecision and obstacle. In this work,...

Full description

Bibliographic Details
Main Author: Francesco Curia
Format: Article
Language:English
Published: Elsevier 2021-11-01
Series:Healthcare Analytics
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2772442521000010
_version_ 1818393165529350144
author Francesco Curia
author_facet Francesco Curia
author_sort Francesco Curia
collection DOAJ
description Clinical Decision Support Systems (CDSS) that use machine learning techniques and their broadest sense of artificial intelligence (AI) must be interpretable and transparent. The lack of transparency instead of providing support could instead become a factor of indecision and obstacle. In this work, a very complex and important problem from a clinical point of view is tackled, namely the pathology known as Dry Eye Disease (DED), starting from a case-control study on a HIV-positive population and a healthy part of it. The case study is faced on two fronts, the first in which an ensemble-based clustering algorithm is built. Secondly, this algorithm is broken down to analyze each component, making the analysis method transparent and interpretable. Specifically, an ensemble of clustering algorithms is presented, such as k-means, agglomerative, spectral, and birch, which are combined and used in two levels: in the first, the labels are obtained from each clusterizer to recognize significant patterns of the two populations affected by the DED pathology, in the presence of HIV and not. Subsequently, the labels obtained at the first level are used as inputs on which the clusterizers are used again, whose outputs in the final phase serve as a training data set for a supervised method (i.e., logistic regression, decision trees, neural network, etc.), to evaluate every single component separately, through the use of features importance techniques (i.e., decision trees, LASSO regression, Gini Importance (GI), Variable Importance (VI), etc.). In this way, each clustering algorithm used at the first level can be considered a new feature in the next one and evaluate its individual contribution. Furthermore, each characteristic is interpreted through specific methods of the relevance of the characteristics to make the decision support tool as complete as possible. The performance of the methods used in training, both supervised and unsupervised, is evaluated through appropriate metrics, such as the well-known measures of precision, recall, accuracy, and homogeneity. Clustering methods provide results on the groups created and on the influence of features (cytokines) in the two populations examined. The experimental results obtained concerning the association between the development of the DED pathology and the presence or absence of HIV in these patients, and the influence that certain factors have on this problem, are interpreted with methods that are part of that branch known as Explainable AI (i.e., Local Interpretable Model-agnostic Explanations (LIME), Shapley, Individual Conditional Expectation (ICE), etc.). Besides explaining the influence exerted by certain features, the methods used provide both a global and local view on how each factor influences the final probability associated with the possible development of the pathology. The practical implications in using this method can be of support to the clinical diagnoses carried out on the patients examined to evaluate how each factor can be responsible for the possible development of the disease and therefore taken individually in the treatment. To date, the analytical techniques used in the study of this pathology have always provided generalized results, while breaking down the problem and isolating the components could provide valuable information to clinical operators.
first_indexed 2024-12-14T05:40:59Z
format Article
id doaj.art-e1f1f1214d5e4f4c86ff900546ae2221
institution Directory Open Access Journal
issn 2772-4425
language English
last_indexed 2024-12-14T05:40:59Z
publishDate 2021-11-01
publisher Elsevier
record_format Article
series Healthcare Analytics
spelling doaj.art-e1f1f1214d5e4f4c86ff900546ae22212022-12-21T23:15:01ZengElsevierHealthcare Analytics2772-44252021-11-011100001Features and explainable methods for cytokines analysis of Dry Eye Disease in HIV infected patientsFrancesco Curia0Deparment of Statistical Science, Sapienza University of Rome, piazzale Aldo Moro 5, 00185, Rome, ItalyClinical Decision Support Systems (CDSS) that use machine learning techniques and their broadest sense of artificial intelligence (AI) must be interpretable and transparent. The lack of transparency instead of providing support could instead become a factor of indecision and obstacle. In this work, a very complex and important problem from a clinical point of view is tackled, namely the pathology known as Dry Eye Disease (DED), starting from a case-control study on a HIV-positive population and a healthy part of it. The case study is faced on two fronts, the first in which an ensemble-based clustering algorithm is built. Secondly, this algorithm is broken down to analyze each component, making the analysis method transparent and interpretable. Specifically, an ensemble of clustering algorithms is presented, such as k-means, agglomerative, spectral, and birch, which are combined and used in two levels: in the first, the labels are obtained from each clusterizer to recognize significant patterns of the two populations affected by the DED pathology, in the presence of HIV and not. Subsequently, the labels obtained at the first level are used as inputs on which the clusterizers are used again, whose outputs in the final phase serve as a training data set for a supervised method (i.e., logistic regression, decision trees, neural network, etc.), to evaluate every single component separately, through the use of features importance techniques (i.e., decision trees, LASSO regression, Gini Importance (GI), Variable Importance (VI), etc.). In this way, each clustering algorithm used at the first level can be considered a new feature in the next one and evaluate its individual contribution. Furthermore, each characteristic is interpreted through specific methods of the relevance of the characteristics to make the decision support tool as complete as possible. The performance of the methods used in training, both supervised and unsupervised, is evaluated through appropriate metrics, such as the well-known measures of precision, recall, accuracy, and homogeneity. Clustering methods provide results on the groups created and on the influence of features (cytokines) in the two populations examined. The experimental results obtained concerning the association between the development of the DED pathology and the presence or absence of HIV in these patients, and the influence that certain factors have on this problem, are interpreted with methods that are part of that branch known as Explainable AI (i.e., Local Interpretable Model-agnostic Explanations (LIME), Shapley, Individual Conditional Expectation (ICE), etc.). Besides explaining the influence exerted by certain features, the methods used provide both a global and local view on how each factor influences the final probability associated with the possible development of the pathology. The practical implications in using this method can be of support to the clinical diagnoses carried out on the patients examined to evaluate how each factor can be responsible for the possible development of the disease and therefore taken individually in the treatment. To date, the analytical techniques used in the study of this pathology have always provided generalized results, while breaking down the problem and isolating the components could provide valuable information to clinical operators.http://www.sciencedirect.com/science/article/pii/S2772442521000010Clustering ensembleDry eye diseaseExplainable artificial intelligenceMachine learningInterpretable modelsFeatures importance
spellingShingle Francesco Curia
Features and explainable methods for cytokines analysis of Dry Eye Disease in HIV infected patients
Healthcare Analytics
Clustering ensemble
Dry eye disease
Explainable artificial intelligence
Machine learning
Interpretable models
Features importance
title Features and explainable methods for cytokines analysis of Dry Eye Disease in HIV infected patients
title_full Features and explainable methods for cytokines analysis of Dry Eye Disease in HIV infected patients
title_fullStr Features and explainable methods for cytokines analysis of Dry Eye Disease in HIV infected patients
title_full_unstemmed Features and explainable methods for cytokines analysis of Dry Eye Disease in HIV infected patients
title_short Features and explainable methods for cytokines analysis of Dry Eye Disease in HIV infected patients
title_sort features and explainable methods for cytokines analysis of dry eye disease in hiv infected patients
topic Clustering ensemble
Dry eye disease
Explainable artificial intelligence
Machine learning
Interpretable models
Features importance
url http://www.sciencedirect.com/science/article/pii/S2772442521000010
work_keys_str_mv AT francescocuria featuresandexplainablemethodsforcytokinesanalysisofdryeyediseaseinhivinfectedpatients