EmSM: Ensemble Mixed Sampling Method for Classifying Imbalanced Intrusion Detection Data
Research on the application of machine learning to the field of intrusion detection is attracting great interest. However, depending on the application, it is difficult to collect the data needed for training and testing, as the least frequent data type reflects the most serious threats, resulting i...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-04-01
|
Series: | Electronics |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-9292/11/9/1346 |
_version_ | 1797505226107781120 |
---|---|
author | Ilok Jung Jaewon Ji Changseob Cho |
author_facet | Ilok Jung Jaewon Ji Changseob Cho |
author_sort | Ilok Jung |
collection | DOAJ |
description | Research on the application of machine learning to the field of intrusion detection is attracting great interest. However, depending on the application, it is difficult to collect the data needed for training and testing, as the least frequent data type reflects the most serious threats, resulting in imbalanced data, which leads to overfitting and hinders precise classification. To solve this problem, in this study, we propose a mixed resampling method using a hybrid synthetic minority oversampling technique with an edited neural network that increases the minority class and removes noisy data to generate a balanced dataset. A bagging ensemble algorithm is then used to optimize the model with the new data. We performed verification using two public intrusion detection datasets: PKDD2007 (balanced) and CSIC2012 (imbalanced). The proposed technique yields improved performance over state-of-the-art techniques. Furthermore, the proposed technique enables improved true positive identification and classification of serious threats that rarely occur, representing a major functional innovation. |
first_indexed | 2024-03-10T04:15:35Z |
format | Article |
id | doaj.art-f7dec455749f48fea06b4fe15d915d47 |
institution | Directory Open Access Journal |
issn | 2079-9292 |
language | English |
last_indexed | 2024-03-10T04:15:35Z |
publishDate | 2022-04-01 |
publisher | MDPI AG |
record_format | Article |
series | Electronics |
spelling | doaj.art-f7dec455749f48fea06b4fe15d915d472023-11-23T08:02:16ZengMDPI AGElectronics2079-92922022-04-01119134610.3390/electronics11091346EmSM: Ensemble Mixed Sampling Method for Classifying Imbalanced Intrusion Detection DataIlok Jung0Jaewon Ji1Changseob Cho2Graduate School of Information Security, Korea University, Seoul 02841, KoreaCyber Security Research Laboratory, IGLOOSECURITY, Seoul 05836, KoreaCyber Security Research Laboratory, IGLOOSECURITY, Seoul 05836, KoreaResearch on the application of machine learning to the field of intrusion detection is attracting great interest. However, depending on the application, it is difficult to collect the data needed for training and testing, as the least frequent data type reflects the most serious threats, resulting in imbalanced data, which leads to overfitting and hinders precise classification. To solve this problem, in this study, we propose a mixed resampling method using a hybrid synthetic minority oversampling technique with an edited neural network that increases the minority class and removes noisy data to generate a balanced dataset. A bagging ensemble algorithm is then used to optimize the model with the new data. We performed verification using two public intrusion detection datasets: PKDD2007 (balanced) and CSIC2012 (imbalanced). The proposed technique yields improved performance over state-of-the-art techniques. Furthermore, the proposed technique enables improved true positive identification and classification of serious threats that rarely occur, representing a major functional innovation.https://www.mdpi.com/2079-9292/11/9/1346imbalanced dataintrusion detectionmachine learningsampling method |
spellingShingle | Ilok Jung Jaewon Ji Changseob Cho EmSM: Ensemble Mixed Sampling Method for Classifying Imbalanced Intrusion Detection Data Electronics imbalanced data intrusion detection machine learning sampling method |
title | EmSM: Ensemble Mixed Sampling Method for Classifying Imbalanced Intrusion Detection Data |
title_full | EmSM: Ensemble Mixed Sampling Method for Classifying Imbalanced Intrusion Detection Data |
title_fullStr | EmSM: Ensemble Mixed Sampling Method for Classifying Imbalanced Intrusion Detection Data |
title_full_unstemmed | EmSM: Ensemble Mixed Sampling Method for Classifying Imbalanced Intrusion Detection Data |
title_short | EmSM: Ensemble Mixed Sampling Method for Classifying Imbalanced Intrusion Detection Data |
title_sort | emsm ensemble mixed sampling method for classifying imbalanced intrusion detection data |
topic | imbalanced data intrusion detection machine learning sampling method |
url | https://www.mdpi.com/2079-9292/11/9/1346 |
work_keys_str_mv | AT ilokjung emsmensemblemixedsamplingmethodforclassifyingimbalancedintrusiondetectiondata AT jaewonji emsmensemblemixedsamplingmethodforclassifyingimbalancedintrusiondetectiondata AT changseobcho emsmensemblemixedsamplingmethodforclassifyingimbalancedintrusiondetectiondata |