Toward an Efficient Automatic Self-Augmentation Labeling Tool for Intrusion Detection Based on a Semi-Supervised Approach

Intrusion detection systems (IDSs) based on machine learning algorithms represent a key component for securing computer networks, where normal and abnormal behaviours of network traffic are automatically learned with no or limited domain experts’ interference. Most of existing IDS approaches rely on...

Full description

Bibliographic Details
Main Authors: Basmah Alsulami, Abdulmohsen Almalawi, Adil Fahad
Format: Article
Language:English
Published: MDPI AG 2022-07-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/12/14/7189
Description
Summary:Intrusion detection systems (IDSs) based on machine learning algorithms represent a key component for securing computer networks, where normal and abnormal behaviours of network traffic are automatically learned with no or limited domain experts’ interference. Most of existing IDS approaches rely on labeled predefined classes which require domain experts to efficiently and accurately identify anomalies and threats. However, it is very hard to acquire reliable, up-to-date, and sufficient labeled data for an efficient traffic intrusion detection model. To address such an issue, this paper aims to develop a novel self-automatic labeling intrusion detection approach (called SAL) which utilises only small labeled network traffic data to potentially detect most types of attacks including zero-day attacks. In particular, the proposed SAL approach has three phases including: (i) an ensemble-based decision-making phase to address the limitations of a single classifier by relying on the predictions of multi-classifiers, (ii) a function agreement phase to assign the class label based on an adaptive confidence threshold to unlabeled observations, and (iii) an augmentation labeling phase to maximise the accuracy and the efficiency of the intrusion detection systems in a classifier model and to detect new attacks and anomalies by utilising a hybrid voting-based ensemble learning approach. Experimental results on available network traffic data sets demonstrate that the proposed SAL approach achieves high performance in comparison to two well-known baseline IDSs based on machine learning algorithms.
ISSN:2076-3417