Machine Learning With Variational AutoEncoder for Imbalanced Datasets in Intrusion Detection

As a result of the explosion of security attacks and the complexity of modern networks, machine learning (ML) has recently become the favored approach for intrusion detection systems (IDS). However, the ML approach usually faces three challenges: massive attack variants, imbalanced data issues, and...

Full description

Bibliographic Details
Main Authors:	Ying-Dar Lin, Zi-Qiang Liu, Ren-Hung Hwang, Van-Linh Nguyen, Po-Ching Lin, Yuan-Cheng Lai
Format:	Article
Language:	English
Published:	IEEE 2022-01-01
Series:	IEEE Access
Subjects:	Imbalanced dataset machine learning variational autoencoder intrusion detection
Online Access:	https://ieeexplore.ieee.org/document/9705580/

_version_	1818323719090601984
author	Ying-Dar Lin Zi-Qiang Liu Ren-Hung Hwang Van-Linh Nguyen Po-Ching Lin Yuan-Cheng Lai
author_facet	Ying-Dar Lin Zi-Qiang Liu Ren-Hung Hwang Van-Linh Nguyen Po-Ching Lin Yuan-Cheng Lai
author_sort	Ying-Dar Lin
collection	DOAJ
description	As a result of the explosion of security attacks and the complexity of modern networks, machine learning (ML) has recently become the favored approach for intrusion detection systems (IDS). However, the ML approach usually faces three challenges: massive attack variants, imbalanced data issues, and appropriate data segmentation. Improper handling of the issues will significantly degrade ML performance, e.g., resulting in high false-negative and low recall rates. Despite many efforts have done in the literature, detecting security attacks in a complicated network environment with imperfect data collection is still an open issue. This work proposes a <italic>machine learning</italic> framework with a combination of a <italic>variational autoencoder</italic> and <italic>multilayer perceptron</italic> model to deal with imbalanced datasets and detect the explosion of attack variants on the Internet. The detection engine also includes an efficient <italic>range-based sequential search</italic> algorithm to address the segmentation challenge in data pre-processing from multiple sources (network packets, system/statistic logs) effectively. Our work is the first attempt to demonstrate the effect of using an appropriate combination of ML models for boosting IDS detection capability in a heterogeneous environment, where data collection imperfection is common. Experimental results on a public system log dataset (e.g., HDFS) show that our method gains approximately as much as 97% on F1 score and 98% on recall rate, a promising result compared to the same measurement of other solutions. Even better, we found that the proposed treatment of imbalanced datasets can improve up to 35% on the F1 score and 27% on recall rate. The testing results also indicate that our model can detect new attack variants.
first_indexed	2024-12-13T11:17:09Z
format	Article
id	doaj.art-0b894e7af5fd4cab87e539ceac029d9b
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-12-13T11:17:09Z
publishDate	2022-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-0b894e7af5fd4cab87e539ceac029d9b2022-12-21T23:48:35ZengIEEEIEEE Access2169-35362022-01-0110152471526010.1109/ACCESS.2022.31492959705580Machine Learning With Variational AutoEncoder for Imbalanced Datasets in Intrusion DetectionYing-Dar Lin0https://orcid.org/0000-0002-5226-4396Zi-Qiang Liu1Ren-Hung Hwang2https://orcid.org/0000-0001-7996-4184Van-Linh Nguyen3https://orcid.org/0000-0002-3472-0108Po-Ching Lin4https://orcid.org/0000-0001-8294-5857Yuan-Cheng Lai5https://orcid.org/0000-0003-3695-5784Department of Computer Science, National Yang Ming Chiao Tung University, Hsinchu City, TaiwanDepartment of Computer Science, National Yang Ming Chiao Tung University, Hsinchu City, TaiwanDepartment of Computer Science, National Yang Ming Chiao Tung University, Hsinchu City, TaiwanDepartment of Computer Science and Information Engineering, National Chung Cheng University, Chiayi County, TaiwanDepartment of Computer Science and Information Engineering, National Chung Cheng University, Chiayi County, TaiwanDepartment of Information Management, National Taiwan University of Science and Technology, Taipei City, TaiwanAs a result of the explosion of security attacks and the complexity of modern networks, machine learning (ML) has recently become the favored approach for intrusion detection systems (IDS). However, the ML approach usually faces three challenges: massive attack variants, imbalanced data issues, and appropriate data segmentation. Improper handling of the issues will significantly degrade ML performance, e.g., resulting in high false-negative and low recall rates. Despite many efforts have done in the literature, detecting security attacks in a complicated network environment with imperfect data collection is still an open issue. This work proposes a <italic>machine learning</italic> framework with a combination of a <italic>variational autoencoder</italic> and <italic>multilayer perceptron</italic> model to deal with imbalanced datasets and detect the explosion of attack variants on the Internet. The detection engine also includes an efficient <italic>range-based sequential search</italic> algorithm to address the segmentation challenge in data pre-processing from multiple sources (network packets, system/statistic logs) effectively. Our work is the first attempt to demonstrate the effect of using an appropriate combination of ML models for boosting IDS detection capability in a heterogeneous environment, where data collection imperfection is common. Experimental results on a public system log dataset (e.g., HDFS) show that our method gains approximately as much as 97% on F1 score and 98% on recall rate, a promising result compared to the same measurement of other solutions. Even better, we found that the proposed treatment of imbalanced datasets can improve up to 35% on the F1 score and 27% on recall rate. The testing results also indicate that our model can detect new attack variants.https://ieeexplore.ieee.org/document/9705580/Imbalanced datasetmachine learningvariational autoencoderintrusion detection
spellingShingle	Ying-Dar Lin Zi-Qiang Liu Ren-Hung Hwang Van-Linh Nguyen Po-Ching Lin Yuan-Cheng Lai Machine Learning With Variational AutoEncoder for Imbalanced Datasets in Intrusion Detection IEEE Access Imbalanced dataset machine learning variational autoencoder intrusion detection
title	Machine Learning With Variational AutoEncoder for Imbalanced Datasets in Intrusion Detection
title_full	Machine Learning With Variational AutoEncoder for Imbalanced Datasets in Intrusion Detection
title_fullStr	Machine Learning With Variational AutoEncoder for Imbalanced Datasets in Intrusion Detection
title_full_unstemmed	Machine Learning With Variational AutoEncoder for Imbalanced Datasets in Intrusion Detection
title_short	Machine Learning With Variational AutoEncoder for Imbalanced Datasets in Intrusion Detection
title_sort	machine learning with variational autoencoder for imbalanced datasets in intrusion detection
topic	Imbalanced dataset machine learning variational autoencoder intrusion detection
url	https://ieeexplore.ieee.org/document/9705580/
work_keys_str_mv	AT yingdarlin machinelearningwithvariationalautoencoderforimbalanceddatasetsinintrusiondetection AT ziqiangliu machinelearningwithvariationalautoencoderforimbalanceddatasetsinintrusiondetection AT renhunghwang machinelearningwithvariationalautoencoderforimbalanceddatasetsinintrusiondetection AT vanlinhnguyen machinelearningwithvariationalautoencoderforimbalanceddatasetsinintrusiondetection AT pochinglin machinelearningwithvariationalautoencoderforimbalanceddatasetsinintrusiondetection AT yuanchenglai machinelearningwithvariationalautoencoderforimbalanceddatasetsinintrusiondetection

Machine Learning With Variational AutoEncoder for Imbalanced Datasets in Intrusion Detection

Similar Items