Features and explainable methods for cytokines analysis of Dry Eye Disease in HIV infected patients

Clinical Decision Support Systems (CDSS) that use machine learning techniques and their broadest sense of artificial intelligence (AI) must be interpretable and transparent. The lack of transparency instead of providing support could instead become a factor of indecision and obstacle. In this work,...

Full description

Bibliographic Details
Main Author:	Francesco Curia
Format:	Article
Language:	English
Published:	Elsevier 2021-11-01
Series:	Healthcare Analytics
Subjects:	Clustering ensemble Dry eye disease Explainable artificial intelligence Machine learning Interpretable models Features importance
Online Access:	http://www.sciencedirect.com/science/article/pii/S2772442521000010

_version_	1818393165529350144
author	Francesco Curia
author_facet	Francesco Curia
author_sort	Francesco Curia
collection	DOAJ
description	Clinical Decision Support Systems (CDSS) that use machine learning techniques and their broadest sense of artificial intelligence (AI) must be interpretable and transparent. The lack of transparency instead of providing support could instead become a factor of indecision and obstacle. In this work, a very complex and important problem from a clinical point of view is tackled, namely the pathology known as Dry Eye Disease (DED), starting from a case-control study on a HIV-positive population and a healthy part of it. The case study is faced on two fronts, the first in which an ensemble-based clustering algorithm is built. Secondly, this algorithm is broken down to analyze each component, making the analysis method transparent and interpretable. Specifically, an ensemble of clustering algorithms is presented, such as k-means, agglomerative, spectral, and birch, which are combined and used in two levels: in the first, the labels are obtained from each clusterizer to recognize significant patterns of the two populations affected by the DED pathology, in the presence of HIV and not. Subsequently, the labels obtained at the first level are used as inputs on which the clusterizers are used again, whose outputs in the final phase serve as a training data set for a supervised method (i.e., logistic regression, decision trees, neural network, etc.), to evaluate every single component separately, through the use of features importance techniques (i.e., decision trees, LASSO regression, Gini Importance (GI), Variable Importance (VI), etc.). In this way, each clustering algorithm used at the first level can be considered a new feature in the next one and evaluate its individual contribution. Furthermore, each characteristic is interpreted through specific methods of the relevance of the characteristics to make the decision support tool as complete as possible. The performance of the methods used in training, both supervised and unsupervised, is evaluated through appropriate metrics, such as the well-known measures of precision, recall, accuracy, and homogeneity. Clustering methods provide results on the groups created and on the influence of features (cytokines) in the two populations examined. The experimental results obtained concerning the association between the development of the DED pathology and the presence or absence of HIV in these patients, and the influence that certain factors have on this problem, are interpreted with methods that are part of that branch known as Explainable AI (i.e., Local Interpretable Model-agnostic Explanations (LIME), Shapley, Individual Conditional Expectation (ICE), etc.). Besides explaining the influence exerted by certain features, the methods used provide both a global and local view on how each factor influences the final probability associated with the possible development of the pathology. The practical implications in using this method can be of support to the clinical diagnoses carried out on the patients examined to evaluate how each factor can be responsible for the possible development of the disease and therefore taken individually in the treatment. To date, the analytical techniques used in the study of this pathology have always provided generalized results, while breaking down the problem and isolating the components could provide valuable information to clinical operators.
first_indexed	2024-12-14T05:40:59Z
format	Article
id	doaj.art-e1f1f1214d5e4f4c86ff900546ae2221
institution	Directory Open Access Journal
issn	2772-4425
language	English
last_indexed	2024-12-14T05:40:59Z
publishDate	2021-11-01
publisher	Elsevier
record_format	Article
series	Healthcare Analytics
spelling	doaj.art-e1f1f1214d5e4f4c86ff900546ae22212022-12-21T23:15:01ZengElsevierHealthcare Analytics2772-44252021-11-011100001Features and explainable methods for cytokines analysis of Dry Eye Disease in HIV infected patientsFrancesco Curia0Deparment of Statistical Science, Sapienza University of Rome, piazzale Aldo Moro 5, 00185, Rome, ItalyClinical Decision Support Systems (CDSS) that use machine learning techniques and their broadest sense of artificial intelligence (AI) must be interpretable and transparent. The lack of transparency instead of providing support could instead become a factor of indecision and obstacle. In this work, a very complex and important problem from a clinical point of view is tackled, namely the pathology known as Dry Eye Disease (DED), starting from a case-control study on a HIV-positive population and a healthy part of it. The case study is faced on two fronts, the first in which an ensemble-based clustering algorithm is built. Secondly, this algorithm is broken down to analyze each component, making the analysis method transparent and interpretable. Specifically, an ensemble of clustering algorithms is presented, such as k-means, agglomerative, spectral, and birch, which are combined and used in two levels: in the first, the labels are obtained from each clusterizer to recognize significant patterns of the two populations affected by the DED pathology, in the presence of HIV and not. Subsequently, the labels obtained at the first level are used as inputs on which the clusterizers are used again, whose outputs in the final phase serve as a training data set for a supervised method (i.e., logistic regression, decision trees, neural network, etc.), to evaluate every single component separately, through the use of features importance techniques (i.e., decision trees, LASSO regression, Gini Importance (GI), Variable Importance (VI), etc.). In this way, each clustering algorithm used at the first level can be considered a new feature in the next one and evaluate its individual contribution. Furthermore, each characteristic is interpreted through specific methods of the relevance of the characteristics to make the decision support tool as complete as possible. The performance of the methods used in training, both supervised and unsupervised, is evaluated through appropriate metrics, such as the well-known measures of precision, recall, accuracy, and homogeneity. Clustering methods provide results on the groups created and on the influence of features (cytokines) in the two populations examined. The experimental results obtained concerning the association between the development of the DED pathology and the presence or absence of HIV in these patients, and the influence that certain factors have on this problem, are interpreted with methods that are part of that branch known as Explainable AI (i.e., Local Interpretable Model-agnostic Explanations (LIME), Shapley, Individual Conditional Expectation (ICE), etc.). Besides explaining the influence exerted by certain features, the methods used provide both a global and local view on how each factor influences the final probability associated with the possible development of the pathology. The practical implications in using this method can be of support to the clinical diagnoses carried out on the patients examined to evaluate how each factor can be responsible for the possible development of the disease and therefore taken individually in the treatment. To date, the analytical techniques used in the study of this pathology have always provided generalized results, while breaking down the problem and isolating the components could provide valuable information to clinical operators.http://www.sciencedirect.com/science/article/pii/S2772442521000010Clustering ensembleDry eye diseaseExplainable artificial intelligenceMachine learningInterpretable modelsFeatures importance
spellingShingle	Francesco Curia Features and explainable methods for cytokines analysis of Dry Eye Disease in HIV infected patients Healthcare Analytics Clustering ensemble Dry eye disease Explainable artificial intelligence Machine learning Interpretable models Features importance
title	Features and explainable methods for cytokines analysis of Dry Eye Disease in HIV infected patients
title_full	Features and explainable methods for cytokines analysis of Dry Eye Disease in HIV infected patients
title_fullStr	Features and explainable methods for cytokines analysis of Dry Eye Disease in HIV infected patients
title_full_unstemmed	Features and explainable methods for cytokines analysis of Dry Eye Disease in HIV infected patients
title_short	Features and explainable methods for cytokines analysis of Dry Eye Disease in HIV infected patients
title_sort	features and explainable methods for cytokines analysis of dry eye disease in hiv infected patients
topic	Clustering ensemble Dry eye disease Explainable artificial intelligence Machine learning Interpretable models Features importance
url	http://www.sciencedirect.com/science/article/pii/S2772442521000010
work_keys_str_mv	AT francescocuria featuresandexplainablemethodsforcytokinesanalysisofdryeyediseaseinhivinfectedpatients

Features and explainable methods for cytokines analysis of Dry Eye Disease in HIV infected patients

Similar Items