Machine Learning for Data Center Optimizations: Feature Selection Using Shapley Additive exPlanation (SHAP)

The need for artificial intelligence (AI) and machine learning (ML) models to optimize data center (DC) operations increases as the volume of operations management data upsurges tremendously. These strategies can assist operators in better understanding their DC operations and help them make informe...

Full description

Bibliographic Details
Main Authors: Yibrah Gebreyesus, Damian Dalton, Sebastian Nixon, Davide De Chiara, Marta Chinnici
Format: Article
Language:English
Published: MDPI AG 2023-02-01
Series:Future Internet
Subjects:
Online Access:https://www.mdpi.com/1999-5903/15/3/88
_version_ 1797611608169512960
author Yibrah Gebreyesus
Damian Dalton
Sebastian Nixon
Davide De Chiara
Marta Chinnici
author_facet Yibrah Gebreyesus
Damian Dalton
Sebastian Nixon
Davide De Chiara
Marta Chinnici
author_sort Yibrah Gebreyesus
collection DOAJ
description The need for artificial intelligence (AI) and machine learning (ML) models to optimize data center (DC) operations increases as the volume of operations management data upsurges tremendously. These strategies can assist operators in better understanding their DC operations and help them make informed decisions upfront to maintain service reliability and availability. The strategies include developing models that optimize energy efficiency, identifying inefficient resource utilization and scheduling policies, and predicting outages. In addition to model hyperparameter tuning, feature subset selection (FSS) is critical for identifying relevant features for effectively modeling DC operations to provide insight into the data, optimize model performance, and reduce computational expenses. Hence, this paper introduces the Shapley Additive exPlanation (SHAP) values method, a class of additive feature attribution values for identifying relevant features that is rarely discussed in the literature. We compared its effectiveness with several commonly used, importance-based feature selection methods. The methods were tested on real DC operations data streams obtained from the ENEA CRESCO6 cluster with 20,832 cores. To demonstrate the effectiveness of SHAP compared to other methods, we selected the top ten most important features from each method, retrained the predictive models, and evaluated their performance using the MAE, RMSE, and MPAE evaluation criteria. The results presented in this paper demonstrate that the predictive models trained using features selected with the SHAP-assisted method performed well, with a lower error and a reasonable execution time compared to other methods.
first_indexed 2024-03-11T06:31:06Z
format Article
id doaj.art-7add61ec9eab4fdb8ad3b3b4cdcb3194
institution Directory Open Access Journal
issn 1999-5903
language English
last_indexed 2024-03-11T06:31:06Z
publishDate 2023-02-01
publisher MDPI AG
record_format Article
series Future Internet
spelling doaj.art-7add61ec9eab4fdb8ad3b3b4cdcb31942023-11-17T11:12:56ZengMDPI AGFuture Internet1999-59032023-02-011538810.3390/fi15030088Machine Learning for Data Center Optimizations: Feature Selection Using Shapley Additive exPlanation (SHAP)Yibrah Gebreyesus0Damian Dalton1Sebastian Nixon2Davide De Chiara3Marta Chinnici4School of Computer Science, University College of Dublin, D04 V1W8 Dublin, IrelandSchool of Computer Science, University College of Dublin, D04 V1W8 Dublin, IrelandSchool of Computer Science, Wolaita Sodo University, Wolaita P.O. Box 138, EthiopiaENEA-R.C. Portici, 80055 Portici (NA), ItalyENEA-R.C. Casaccia, 00196 Rome, ItalyThe need for artificial intelligence (AI) and machine learning (ML) models to optimize data center (DC) operations increases as the volume of operations management data upsurges tremendously. These strategies can assist operators in better understanding their DC operations and help them make informed decisions upfront to maintain service reliability and availability. The strategies include developing models that optimize energy efficiency, identifying inefficient resource utilization and scheduling policies, and predicting outages. In addition to model hyperparameter tuning, feature subset selection (FSS) is critical for identifying relevant features for effectively modeling DC operations to provide insight into the data, optimize model performance, and reduce computational expenses. Hence, this paper introduces the Shapley Additive exPlanation (SHAP) values method, a class of additive feature attribution values for identifying relevant features that is rarely discussed in the literature. We compared its effectiveness with several commonly used, importance-based feature selection methods. The methods were tested on real DC operations data streams obtained from the ENEA CRESCO6 cluster with 20,832 cores. To demonstrate the effectiveness of SHAP compared to other methods, we selected the top ten most important features from each method, retrained the predictive models, and evaluated their performance using the MAE, RMSE, and MPAE evaluation criteria. The results presented in this paper demonstrate that the predictive models trained using features selected with the SHAP-assisted method performed well, with a lower error and a reasonable execution time compared to other methods.https://www.mdpi.com/1999-5903/15/3/88data centerartificial intelligencemachine learningfeature selectionSHAPgame theory
spellingShingle Yibrah Gebreyesus
Damian Dalton
Sebastian Nixon
Davide De Chiara
Marta Chinnici
Machine Learning for Data Center Optimizations: Feature Selection Using Shapley Additive exPlanation (SHAP)
Future Internet
data center
artificial intelligence
machine learning
feature selection
SHAP
game theory
title Machine Learning for Data Center Optimizations: Feature Selection Using Shapley Additive exPlanation (SHAP)
title_full Machine Learning for Data Center Optimizations: Feature Selection Using Shapley Additive exPlanation (SHAP)
title_fullStr Machine Learning for Data Center Optimizations: Feature Selection Using Shapley Additive exPlanation (SHAP)
title_full_unstemmed Machine Learning for Data Center Optimizations: Feature Selection Using Shapley Additive exPlanation (SHAP)
title_short Machine Learning for Data Center Optimizations: Feature Selection Using Shapley Additive exPlanation (SHAP)
title_sort machine learning for data center optimizations feature selection using shapley additive explanation shap
topic data center
artificial intelligence
machine learning
feature selection
SHAP
game theory
url https://www.mdpi.com/1999-5903/15/3/88
work_keys_str_mv AT yibrahgebreyesus machinelearningfordatacenteroptimizationsfeatureselectionusingshapleyadditiveexplanationshap
AT damiandalton machinelearningfordatacenteroptimizationsfeatureselectionusingshapleyadditiveexplanationshap
AT sebastiannixon machinelearningfordatacenteroptimizationsfeatureselectionusingshapleyadditiveexplanationshap
AT davidedechiara machinelearningfordatacenteroptimizationsfeatureselectionusingshapleyadditiveexplanationshap
AT martachinnici machinelearningfordatacenteroptimizationsfeatureselectionusingshapleyadditiveexplanationshap