Classification and Explanation for Intrusion Detection System Based on Ensemble Trees and SHAP Method

In recent years, many methods for intrusion detection systems (IDS) have been designed and developed in the research community, which have achieved a perfect detection rate using IDS datasets. Deep neural networks (DNNs) are representative examples applied widely in IDS. However, DNN models are beco...

Full description

Bibliographic Details
Main Authors: Thi-Thu-Huong Le, Haeyoung Kim, Hyoeun Kang, Howon Kim
Format: Article
Language:English
Published: MDPI AG 2022-02-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/22/3/1154
_version_ 1827658641073766400
author Thi-Thu-Huong Le
Haeyoung Kim
Hyoeun Kang
Howon Kim
author_facet Thi-Thu-Huong Le
Haeyoung Kim
Hyoeun Kang
Howon Kim
author_sort Thi-Thu-Huong Le
collection DOAJ
description In recent years, many methods for intrusion detection systems (IDS) have been designed and developed in the research community, which have achieved a perfect detection rate using IDS datasets. Deep neural networks (DNNs) are representative examples applied widely in IDS. However, DNN models are becoming increasingly complex in model architectures with high resource computing in hardware requirements. In addition, it is difficult for humans to obtain explanations behind the decisions made by these DNN models using large IoT-based IDS datasets. Many proposed IDS methods have not been applied in practical deployments, because of the lack of explanation given to cybersecurity experts, to support them in terms of optimizing their decisions according to the judgments of the IDS models. This paper aims to enhance the attack detection performance of IDS with big IoT-based IDS datasets as well as provide explanations of machine learning (ML) model predictions. The proposed ML-based IDS method is based on the ensemble trees approach, including decision tree (DT) and random forest (RF) classifiers which do not require high computing resources for training models. In addition, two big datasets are used for the experimental evaluation of the proposed method, NF-BoT-IoT-v2, and NF-ToN-IoT-v2 (new versions of the original BoT-IoT and ToN-IoT datasets), through the feature set of the net flow meter. In addition, the IoTDS20 dataset is used for experiments. Furthermore, the SHapley additive exPlanations (SHAP) is applied to the eXplainable AI (XAI) methodology to explain and interpret the classification decisions of DT and RF models; this is not only effective in interpreting the final decision of the ensemble tree approach but also supports cybersecurity experts in quickly optimizing and evaluating the correctness of their judgments based on the explanations of the results.
first_indexed 2024-03-09T23:06:31Z
format Article
id doaj.art-0aa2474b74dc4a13826278f77c06c77c
institution Directory Open Access Journal
issn 1424-8220
language English
last_indexed 2024-03-09T23:06:31Z
publishDate 2022-02-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj.art-0aa2474b74dc4a13826278f77c06c77c2023-11-23T17:51:21ZengMDPI AGSensors1424-82202022-02-01223115410.3390/s22031154Classification and Explanation for Intrusion Detection System Based on Ensemble Trees and SHAP MethodThi-Thu-Huong Le0Haeyoung Kim1Hyoeun Kang2Howon Kim3IoT Research Center, Pusan National University, Busan 609735, KoreaSchool of Computer Science and Engineering, Pusan National University, Busan 609735, KoreaSchool of Computer Science and Engineering, Pusan National University, Busan 609735, KoreaSchool of Computer Science and Engineering, Pusan National University, Busan 609735, KoreaIn recent years, many methods for intrusion detection systems (IDS) have been designed and developed in the research community, which have achieved a perfect detection rate using IDS datasets. Deep neural networks (DNNs) are representative examples applied widely in IDS. However, DNN models are becoming increasingly complex in model architectures with high resource computing in hardware requirements. In addition, it is difficult for humans to obtain explanations behind the decisions made by these DNN models using large IoT-based IDS datasets. Many proposed IDS methods have not been applied in practical deployments, because of the lack of explanation given to cybersecurity experts, to support them in terms of optimizing their decisions according to the judgments of the IDS models. This paper aims to enhance the attack detection performance of IDS with big IoT-based IDS datasets as well as provide explanations of machine learning (ML) model predictions. The proposed ML-based IDS method is based on the ensemble trees approach, including decision tree (DT) and random forest (RF) classifiers which do not require high computing resources for training models. In addition, two big datasets are used for the experimental evaluation of the proposed method, NF-BoT-IoT-v2, and NF-ToN-IoT-v2 (new versions of the original BoT-IoT and ToN-IoT datasets), through the feature set of the net flow meter. In addition, the IoTDS20 dataset is used for experiments. Furthermore, the SHapley additive exPlanations (SHAP) is applied to the eXplainable AI (XAI) methodology to explain and interpret the classification decisions of DT and RF models; this is not only effective in interpreting the final decision of the ensemble tree approach but also supports cybersecurity experts in quickly optimizing and evaluating the correctness of their judgments based on the explanations of the results.https://www.mdpi.com/1424-8220/22/3/1154decision treeensemble treesexplanation AI (XAI)intrusion detection systems (IDS)random forestSHapley Additive exPlanations (SHAP)
spellingShingle Thi-Thu-Huong Le
Haeyoung Kim
Hyoeun Kang
Howon Kim
Classification and Explanation for Intrusion Detection System Based on Ensemble Trees and SHAP Method
Sensors
decision tree
ensemble trees
explanation AI (XAI)
intrusion detection systems (IDS)
random forest
SHapley Additive exPlanations (SHAP)
title Classification and Explanation for Intrusion Detection System Based on Ensemble Trees and SHAP Method
title_full Classification and Explanation for Intrusion Detection System Based on Ensemble Trees and SHAP Method
title_fullStr Classification and Explanation for Intrusion Detection System Based on Ensemble Trees and SHAP Method
title_full_unstemmed Classification and Explanation for Intrusion Detection System Based on Ensemble Trees and SHAP Method
title_short Classification and Explanation for Intrusion Detection System Based on Ensemble Trees and SHAP Method
title_sort classification and explanation for intrusion detection system based on ensemble trees and shap method
topic decision tree
ensemble trees
explanation AI (XAI)
intrusion detection systems (IDS)
random forest
SHapley Additive exPlanations (SHAP)
url https://www.mdpi.com/1424-8220/22/3/1154
work_keys_str_mv AT thithuhuongle classificationandexplanationforintrusiondetectionsystembasedonensembletreesandshapmethod
AT haeyoungkim classificationandexplanationforintrusiondetectionsystembasedonensembletreesandshapmethod
AT hyoeunkang classificationandexplanationforintrusiondetectionsystembasedonensembletreesandshapmethod
AT howonkim classificationandexplanationforintrusiondetectionsystembasedonensembletreesandshapmethod