EmSM: Ensemble Mixed Sampling Method for Classifying Imbalanced Intrusion Detection Data

Research on the application of machine learning to the field of intrusion detection is attracting great interest. However, depending on the application, it is difficult to collect the data needed for training and testing, as the least frequent data type reflects the most serious threats, resulting i...

Full description

Bibliographic Details
Main Authors: Ilok Jung, Jaewon Ji, Changseob Cho
Format: Article
Language:English
Published: MDPI AG 2022-04-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/11/9/1346
_version_ 1797505226107781120
author Ilok Jung
Jaewon Ji
Changseob Cho
author_facet Ilok Jung
Jaewon Ji
Changseob Cho
author_sort Ilok Jung
collection DOAJ
description Research on the application of machine learning to the field of intrusion detection is attracting great interest. However, depending on the application, it is difficult to collect the data needed for training and testing, as the least frequent data type reflects the most serious threats, resulting in imbalanced data, which leads to overfitting and hinders precise classification. To solve this problem, in this study, we propose a mixed resampling method using a hybrid synthetic minority oversampling technique with an edited neural network that increases the minority class and removes noisy data to generate a balanced dataset. A bagging ensemble algorithm is then used to optimize the model with the new data. We performed verification using two public intrusion detection datasets: PKDD2007 (balanced) and CSIC2012 (imbalanced). The proposed technique yields improved performance over state-of-the-art techniques. Furthermore, the proposed technique enables improved true positive identification and classification of serious threats that rarely occur, representing a major functional innovation.
first_indexed 2024-03-10T04:15:35Z
format Article
id doaj.art-f7dec455749f48fea06b4fe15d915d47
institution Directory Open Access Journal
issn 2079-9292
language English
last_indexed 2024-03-10T04:15:35Z
publishDate 2022-04-01
publisher MDPI AG
record_format Article
series Electronics
spelling doaj.art-f7dec455749f48fea06b4fe15d915d472023-11-23T08:02:16ZengMDPI AGElectronics2079-92922022-04-01119134610.3390/electronics11091346EmSM: Ensemble Mixed Sampling Method for Classifying Imbalanced Intrusion Detection DataIlok Jung0Jaewon Ji1Changseob Cho2Graduate School of Information Security, Korea University, Seoul 02841, KoreaCyber Security Research Laboratory, IGLOOSECURITY, Seoul 05836, KoreaCyber Security Research Laboratory, IGLOOSECURITY, Seoul 05836, KoreaResearch on the application of machine learning to the field of intrusion detection is attracting great interest. However, depending on the application, it is difficult to collect the data needed for training and testing, as the least frequent data type reflects the most serious threats, resulting in imbalanced data, which leads to overfitting and hinders precise classification. To solve this problem, in this study, we propose a mixed resampling method using a hybrid synthetic minority oversampling technique with an edited neural network that increases the minority class and removes noisy data to generate a balanced dataset. A bagging ensemble algorithm is then used to optimize the model with the new data. We performed verification using two public intrusion detection datasets: PKDD2007 (balanced) and CSIC2012 (imbalanced). The proposed technique yields improved performance over state-of-the-art techniques. Furthermore, the proposed technique enables improved true positive identification and classification of serious threats that rarely occur, representing a major functional innovation.https://www.mdpi.com/2079-9292/11/9/1346imbalanced dataintrusion detectionmachine learningsampling method
spellingShingle Ilok Jung
Jaewon Ji
Changseob Cho
EmSM: Ensemble Mixed Sampling Method for Classifying Imbalanced Intrusion Detection Data
Electronics
imbalanced data
intrusion detection
machine learning
sampling method
title EmSM: Ensemble Mixed Sampling Method for Classifying Imbalanced Intrusion Detection Data
title_full EmSM: Ensemble Mixed Sampling Method for Classifying Imbalanced Intrusion Detection Data
title_fullStr EmSM: Ensemble Mixed Sampling Method for Classifying Imbalanced Intrusion Detection Data
title_full_unstemmed EmSM: Ensemble Mixed Sampling Method for Classifying Imbalanced Intrusion Detection Data
title_short EmSM: Ensemble Mixed Sampling Method for Classifying Imbalanced Intrusion Detection Data
title_sort emsm ensemble mixed sampling method for classifying imbalanced intrusion detection data
topic imbalanced data
intrusion detection
machine learning
sampling method
url https://www.mdpi.com/2079-9292/11/9/1346
work_keys_str_mv AT ilokjung emsmensemblemixedsamplingmethodforclassifyingimbalancedintrusiondetectiondata
AT jaewonji emsmensemblemixedsamplingmethodforclassifyingimbalancedintrusiondetectiondata
AT changseobcho emsmensemblemixedsamplingmethodforclassifyingimbalancedintrusiondetectiondata