Classification Hardness for Supervised Learners on 20 Years of Intrusion Detection Data

This article consolidates analysis of established (NSL-KDD) and new intrusion detection datasets (ISCXIDS2012, CICIDS2017, CICIDS2018) through the use of supervised machine learning (ML) algorithms. The uniformity in analysis procedure opens up the option to compare the obtained results. It also pro...

Full description

Bibliographic Details
Main Authors: Laurens D'hooge, Tim Wauters, Bruno Volckaert, Filip De Turck
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8901110/
_version_ 1818935940011261952
author Laurens D'hooge
Tim Wauters
Bruno Volckaert
Filip De Turck
author_facet Laurens D'hooge
Tim Wauters
Bruno Volckaert
Filip De Turck
author_sort Laurens D'hooge
collection DOAJ
description This article consolidates analysis of established (NSL-KDD) and new intrusion detection datasets (ISCXIDS2012, CICIDS2017, CICIDS2018) through the use of supervised machine learning (ML) algorithms. The uniformity in analysis procedure opens up the option to compare the obtained results. It also provides a stronger foundation for the conclusions about the efficacy of supervised learners on the main classification task in network security. This research is motivated in part to address the lack of adoption of these modern datasets. Starting with a broad scope that includes classification by algorithms from different families on both established and new datasets has been done to expand the existing foundation and reveal the most opportune avenues for further inquiry. After obtaining baseline results, the classification task was increased in difficulty, by reducing the available data to learn from, both horizontally and vertically. The data reduction has been included as a stress-test to verify if the very high baseline results hold up under increasingly harsh constraints. Ultimately, this work contains the most comprehensive set of results on the topic of intrusion detection through supervised machine learning. Researchers working on algorithmic improvements can compare their results to this collection, knowing that all results reported here were gathered through a uniform framework. This work's main contributions are the outstanding classification results on the current state of the art datasets for intrusion detection and the conclusion that these methods show remarkable resilience in classification performance even when aggressively reducing the amount of data to learn from.
first_indexed 2024-12-20T05:28:09Z
format Article
id doaj.art-b4f9f1e8fdc9472689789805262143a8
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-20T05:28:09Z
publishDate 2019-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-b4f9f1e8fdc9472689789805262143a82022-12-21T19:51:50ZengIEEEIEEE Access2169-35362019-01-01716745516746910.1109/ACCESS.2019.29534518901110Classification Hardness for Supervised Learners on 20 Years of Intrusion Detection DataLaurens D'hooge0https://orcid.org/0000-0001-5086-6361Tim Wauters1Bruno Volckaert2Filip De Turck3Department of Information Technology, IDLab, Ghent University–mec, Ghent, BelgiumDepartment of Information Technology, IDLab, Ghent University–mec, Ghent, BelgiumDepartment of Information Technology, IDLab, Ghent University–mec, Ghent, BelgiumDepartment of Information Technology, IDLab, Ghent University–mec, Ghent, BelgiumThis article consolidates analysis of established (NSL-KDD) and new intrusion detection datasets (ISCXIDS2012, CICIDS2017, CICIDS2018) through the use of supervised machine learning (ML) algorithms. The uniformity in analysis procedure opens up the option to compare the obtained results. It also provides a stronger foundation for the conclusions about the efficacy of supervised learners on the main classification task in network security. This research is motivated in part to address the lack of adoption of these modern datasets. Starting with a broad scope that includes classification by algorithms from different families on both established and new datasets has been done to expand the existing foundation and reveal the most opportune avenues for further inquiry. After obtaining baseline results, the classification task was increased in difficulty, by reducing the available data to learn from, both horizontally and vertically. The data reduction has been included as a stress-test to verify if the very high baseline results hold up under increasingly harsh constraints. Ultimately, this work contains the most comprehensive set of results on the topic of intrusion detection through supervised machine learning. Researchers working on algorithmic improvements can compare their results to this collection, knowing that all results reported here were gathered through a uniform framework. This work's main contributions are the outstanding classification results on the current state of the art datasets for intrusion detection and the conclusion that these methods show remarkable resilience in classification performance even when aggressively reducing the amount of data to learn from.https://ieeexplore.ieee.org/document/8901110/CICIDS2017CICIDS2018cyber securityintrusion detectionISCXIDS2012network security
spellingShingle Laurens D'hooge
Tim Wauters
Bruno Volckaert
Filip De Turck
Classification Hardness for Supervised Learners on 20 Years of Intrusion Detection Data
IEEE Access
CICIDS2017
CICIDS2018
cyber security
intrusion detection
ISCXIDS2012
network security
title Classification Hardness for Supervised Learners on 20 Years of Intrusion Detection Data
title_full Classification Hardness for Supervised Learners on 20 Years of Intrusion Detection Data
title_fullStr Classification Hardness for Supervised Learners on 20 Years of Intrusion Detection Data
title_full_unstemmed Classification Hardness for Supervised Learners on 20 Years of Intrusion Detection Data
title_short Classification Hardness for Supervised Learners on 20 Years of Intrusion Detection Data
title_sort classification hardness for supervised learners on 20 years of intrusion detection data
topic CICIDS2017
CICIDS2018
cyber security
intrusion detection
ISCXIDS2012
network security
url https://ieeexplore.ieee.org/document/8901110/
work_keys_str_mv AT laurensdhooge classificationhardnessforsupervisedlearnerson20yearsofintrusiondetectiondata
AT timwauters classificationhardnessforsupervisedlearnerson20yearsofintrusiondetectiondata
AT brunovolckaert classificationhardnessforsupervisedlearnerson20yearsofintrusiondetectiondata
AT filipdeturck classificationhardnessforsupervisedlearnerson20yearsofintrusiondetectiondata