IGRF-RFE: a hybrid feature selection method for MLP-based network intrusion detection on UNSW-NB15 dataset

Abstract The effectiveness of machine learning models can be significantly averse to redundant and irrelevant features present in the large dataset which can cause drastic performance degradation. This paper proposes IGRF-RFE: a hybrid feature selection method tasked for multi-class network anomalie...

Full description

Bibliographic Details
Main Authors: Yuhua Yin, Julian Jang-Jaccard, Wen Xu, Amardeep Singh, Jinting Zhu, Fariza Sabrina, Jin Kwak
Format: Article
Language:English
Published: SpringerOpen 2023-02-01
Series:Journal of Big Data
Online Access:https://doi.org/10.1186/s40537-023-00694-8
_version_ 1811165817658671104
author Yuhua Yin
Julian Jang-Jaccard
Wen Xu
Amardeep Singh
Jinting Zhu
Fariza Sabrina
Jin Kwak
author_facet Yuhua Yin
Julian Jang-Jaccard
Wen Xu
Amardeep Singh
Jinting Zhu
Fariza Sabrina
Jin Kwak
author_sort Yuhua Yin
collection DOAJ
description Abstract The effectiveness of machine learning models can be significantly averse to redundant and irrelevant features present in the large dataset which can cause drastic performance degradation. This paper proposes IGRF-RFE: a hybrid feature selection method tasked for multi-class network anomalies using a multilayer perceptron (MLP) network. IGRF-RFE exploits the qualities of both a filter method for its speed and a wrapper method for its relevance search. In the first phase of our approach, we use a combination of two filter methods, information gain (IG) and random forest (RF) respectively, to reduce the feature subset search space. By combining these two filter methods, the influence of less important features but with the high-frequency values selected by IG is more effectively managed by RF resulting in more relevant features to be included in the feature subset search space. In the second phase of our approach, we use a machine learning-based wrapper method that provides a recursive feature elimination (RFE) to further reduce feature dimensions while taking into account the relevance of similar features. Our experimental results obtained based on the UNSW-NB15 dataset confirmed that our proposed method can improve the accuracy of anomaly detection as it can select more relevant features while reducing the feature space. The results show that the feature is reduced from 42 to 23 while the multi-classification accuracy of MLP is improved from 82.25% to 84.24%.
first_indexed 2024-04-10T15:43:50Z
format Article
id doaj.art-1746c8593838417e8c0e867ecaf08e14
institution Directory Open Access Journal
issn 2196-1115
language English
last_indexed 2024-04-10T15:43:50Z
publishDate 2023-02-01
publisher SpringerOpen
record_format Article
series Journal of Big Data
spelling doaj.art-1746c8593838417e8c0e867ecaf08e142023-02-12T12:14:33ZengSpringerOpenJournal of Big Data2196-11152023-02-0110112610.1186/s40537-023-00694-8IGRF-RFE: a hybrid feature selection method for MLP-based network intrusion detection on UNSW-NB15 datasetYuhua Yin0Julian Jang-Jaccard1Wen Xu2Amardeep Singh3Jinting Zhu4Fariza Sabrina5Jin Kwak6Comp Sci/Info Tech, Cybersecurity Lab, Massey UniversityComp Sci/Info Tech, Cybersecurity Lab, Massey UniversityComp Sci/Info Tech, Cybersecurity Lab, Massey UniversityComp Sci/Info Tech, Cybersecurity Lab, Massey UniversityComp Sci/Info Tech, Cybersecurity Lab, Massey UniversitySchool of Engineering and Technology, Central Queensland UniversityDepartment of Cyber Security, Ajou UniversityAbstract The effectiveness of machine learning models can be significantly averse to redundant and irrelevant features present in the large dataset which can cause drastic performance degradation. This paper proposes IGRF-RFE: a hybrid feature selection method tasked for multi-class network anomalies using a multilayer perceptron (MLP) network. IGRF-RFE exploits the qualities of both a filter method for its speed and a wrapper method for its relevance search. In the first phase of our approach, we use a combination of two filter methods, information gain (IG) and random forest (RF) respectively, to reduce the feature subset search space. By combining these two filter methods, the influence of less important features but with the high-frequency values selected by IG is more effectively managed by RF resulting in more relevant features to be included in the feature subset search space. In the second phase of our approach, we use a machine learning-based wrapper method that provides a recursive feature elimination (RFE) to further reduce feature dimensions while taking into account the relevance of similar features. Our experimental results obtained based on the UNSW-NB15 dataset confirmed that our proposed method can improve the accuracy of anomaly detection as it can select more relevant features while reducing the feature space. The results show that the feature is reduced from 42 to 23 while the multi-classification accuracy of MLP is improved from 82.25% to 84.24%.https://doi.org/10.1186/s40537-023-00694-8
spellingShingle Yuhua Yin
Julian Jang-Jaccard
Wen Xu
Amardeep Singh
Jinting Zhu
Fariza Sabrina
Jin Kwak
IGRF-RFE: a hybrid feature selection method for MLP-based network intrusion detection on UNSW-NB15 dataset
Journal of Big Data
title IGRF-RFE: a hybrid feature selection method for MLP-based network intrusion detection on UNSW-NB15 dataset
title_full IGRF-RFE: a hybrid feature selection method for MLP-based network intrusion detection on UNSW-NB15 dataset
title_fullStr IGRF-RFE: a hybrid feature selection method for MLP-based network intrusion detection on UNSW-NB15 dataset
title_full_unstemmed IGRF-RFE: a hybrid feature selection method for MLP-based network intrusion detection on UNSW-NB15 dataset
title_short IGRF-RFE: a hybrid feature selection method for MLP-based network intrusion detection on UNSW-NB15 dataset
title_sort igrf rfe a hybrid feature selection method for mlp based network intrusion detection on unsw nb15 dataset
url https://doi.org/10.1186/s40537-023-00694-8
work_keys_str_mv AT yuhuayin igrfrfeahybridfeatureselectionmethodformlpbasednetworkintrusiondetectiononunswnb15dataset
AT julianjangjaccard igrfrfeahybridfeatureselectionmethodformlpbasednetworkintrusiondetectiononunswnb15dataset
AT wenxu igrfrfeahybridfeatureselectionmethodformlpbasednetworkintrusiondetectiononunswnb15dataset
AT amardeepsingh igrfrfeahybridfeatureselectionmethodformlpbasednetworkintrusiondetectiononunswnb15dataset
AT jintingzhu igrfrfeahybridfeatureselectionmethodformlpbasednetworkintrusiondetectiononunswnb15dataset
AT farizasabrina igrfrfeahybridfeatureselectionmethodformlpbasednetworkintrusiondetectiononunswnb15dataset
AT jinkwak igrfrfeahybridfeatureselectionmethodformlpbasednetworkintrusiondetectiononunswnb15dataset