Machine Learning for Data Center Optimizations: Feature Selection Using Shapley Additive exPlanation (SHAP)
The need for artificial intelligence (AI) and machine learning (ML) models to optimize data center (DC) operations increases as the volume of operations management data upsurges tremendously. These strategies can assist operators in better understanding their DC operations and help them make informe...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-02-01
|
Series: | Future Internet |
Subjects: | |
Online Access: | https://www.mdpi.com/1999-5903/15/3/88 |
_version_ | 1797611608169512960 |
---|---|
author | Yibrah Gebreyesus Damian Dalton Sebastian Nixon Davide De Chiara Marta Chinnici |
author_facet | Yibrah Gebreyesus Damian Dalton Sebastian Nixon Davide De Chiara Marta Chinnici |
author_sort | Yibrah Gebreyesus |
collection | DOAJ |
description | The need for artificial intelligence (AI) and machine learning (ML) models to optimize data center (DC) operations increases as the volume of operations management data upsurges tremendously. These strategies can assist operators in better understanding their DC operations and help them make informed decisions upfront to maintain service reliability and availability. The strategies include developing models that optimize energy efficiency, identifying inefficient resource utilization and scheduling policies, and predicting outages. In addition to model hyperparameter tuning, feature subset selection (FSS) is critical for identifying relevant features for effectively modeling DC operations to provide insight into the data, optimize model performance, and reduce computational expenses. Hence, this paper introduces the Shapley Additive exPlanation (SHAP) values method, a class of additive feature attribution values for identifying relevant features that is rarely discussed in the literature. We compared its effectiveness with several commonly used, importance-based feature selection methods. The methods were tested on real DC operations data streams obtained from the ENEA CRESCO6 cluster with 20,832 cores. To demonstrate the effectiveness of SHAP compared to other methods, we selected the top ten most important features from each method, retrained the predictive models, and evaluated their performance using the MAE, RMSE, and MPAE evaluation criteria. The results presented in this paper demonstrate that the predictive models trained using features selected with the SHAP-assisted method performed well, with a lower error and a reasonable execution time compared to other methods. |
first_indexed | 2024-03-11T06:31:06Z |
format | Article |
id | doaj.art-7add61ec9eab4fdb8ad3b3b4cdcb3194 |
institution | Directory Open Access Journal |
issn | 1999-5903 |
language | English |
last_indexed | 2024-03-11T06:31:06Z |
publishDate | 2023-02-01 |
publisher | MDPI AG |
record_format | Article |
series | Future Internet |
spelling | doaj.art-7add61ec9eab4fdb8ad3b3b4cdcb31942023-11-17T11:12:56ZengMDPI AGFuture Internet1999-59032023-02-011538810.3390/fi15030088Machine Learning for Data Center Optimizations: Feature Selection Using Shapley Additive exPlanation (SHAP)Yibrah Gebreyesus0Damian Dalton1Sebastian Nixon2Davide De Chiara3Marta Chinnici4School of Computer Science, University College of Dublin, D04 V1W8 Dublin, IrelandSchool of Computer Science, University College of Dublin, D04 V1W8 Dublin, IrelandSchool of Computer Science, Wolaita Sodo University, Wolaita P.O. Box 138, EthiopiaENEA-R.C. Portici, 80055 Portici (NA), ItalyENEA-R.C. Casaccia, 00196 Rome, ItalyThe need for artificial intelligence (AI) and machine learning (ML) models to optimize data center (DC) operations increases as the volume of operations management data upsurges tremendously. These strategies can assist operators in better understanding their DC operations and help them make informed decisions upfront to maintain service reliability and availability. The strategies include developing models that optimize energy efficiency, identifying inefficient resource utilization and scheduling policies, and predicting outages. In addition to model hyperparameter tuning, feature subset selection (FSS) is critical for identifying relevant features for effectively modeling DC operations to provide insight into the data, optimize model performance, and reduce computational expenses. Hence, this paper introduces the Shapley Additive exPlanation (SHAP) values method, a class of additive feature attribution values for identifying relevant features that is rarely discussed in the literature. We compared its effectiveness with several commonly used, importance-based feature selection methods. The methods were tested on real DC operations data streams obtained from the ENEA CRESCO6 cluster with 20,832 cores. To demonstrate the effectiveness of SHAP compared to other methods, we selected the top ten most important features from each method, retrained the predictive models, and evaluated their performance using the MAE, RMSE, and MPAE evaluation criteria. The results presented in this paper demonstrate that the predictive models trained using features selected with the SHAP-assisted method performed well, with a lower error and a reasonable execution time compared to other methods.https://www.mdpi.com/1999-5903/15/3/88data centerartificial intelligencemachine learningfeature selectionSHAPgame theory |
spellingShingle | Yibrah Gebreyesus Damian Dalton Sebastian Nixon Davide De Chiara Marta Chinnici Machine Learning for Data Center Optimizations: Feature Selection Using Shapley Additive exPlanation (SHAP) Future Internet data center artificial intelligence machine learning feature selection SHAP game theory |
title | Machine Learning for Data Center Optimizations: Feature Selection Using Shapley Additive exPlanation (SHAP) |
title_full | Machine Learning for Data Center Optimizations: Feature Selection Using Shapley Additive exPlanation (SHAP) |
title_fullStr | Machine Learning for Data Center Optimizations: Feature Selection Using Shapley Additive exPlanation (SHAP) |
title_full_unstemmed | Machine Learning for Data Center Optimizations: Feature Selection Using Shapley Additive exPlanation (SHAP) |
title_short | Machine Learning for Data Center Optimizations: Feature Selection Using Shapley Additive exPlanation (SHAP) |
title_sort | machine learning for data center optimizations feature selection using shapley additive explanation shap |
topic | data center artificial intelligence machine learning feature selection SHAP game theory |
url | https://www.mdpi.com/1999-5903/15/3/88 |
work_keys_str_mv | AT yibrahgebreyesus machinelearningfordatacenteroptimizationsfeatureselectionusingshapleyadditiveexplanationshap AT damiandalton machinelearningfordatacenteroptimizationsfeatureselectionusingshapleyadditiveexplanationshap AT sebastiannixon machinelearningfordatacenteroptimizationsfeatureselectionusingshapleyadditiveexplanationshap AT davidedechiara machinelearningfordatacenteroptimizationsfeatureselectionusingshapleyadditiveexplanationshap AT martachinnici machinelearningfordatacenteroptimizationsfeatureselectionusingshapleyadditiveexplanationshap |