Extracting salient features for network intrusion detection using machine learning methods

This work presents a data preprocessing and feature selection framework to support data mining and network security experts in minimal feature set selection of intrusion detection data. This process is supported by detailed visualisation and examination of class distributions. Distribution histogram...

Full description

Bibliographic Details
Main Authors:	Ralf C. Staudemeyer, Christian W. Omlin
Format:	Article
Language:	English
Published:	South African Institute of Computer Scientists and Information Technologists 2014-06-01
Series:	South African Computer Journal
Subjects:	network intrusion detection: feature selection machine learning decision trees
Online Access:	http://sacj.cs.uct.ac.za/index.php/sacj/article/view/200

_version_	1818035133542825984
author	Ralf C. Staudemeyer Christian W. Omlin
author_facet	Ralf C. Staudemeyer Christian W. Omlin
author_sort	Ralf C. Staudemeyer
collection	DOAJ
description	This work presents a data preprocessing and feature selection framework to support data mining and network security experts in minimal feature set selection of intrusion detection data. This process is supported by detailed visualisation and examination of class distributions. Distribution histograms, scatter plots and information gain are presented as supportive feature reduction tools. The feature reduction process applied is based on decision tree pruning and backward elimination. This paper starts with an analysis of the KDD Cup '99 datasets and their potential for feature reduction. The dataset consists of connection records with 41 features whose relevance for intrusion detection are not clear. All traffic is either classified `normal' or into the four attack types denial-of-service, network probe, remote-to-local or user-to-root. Using our custom feature selection process, we show how we can significantly reduce the number features in the dataset to a few salient features. We conclude by presenting minimal sets with 4--8 salient features for two-class and multi-class categorisation for detecting intrusions, as well as for the detection of individual attack classes; the performance using a static classifier compares favourably to the performance using all features available. The suggested process is of general nature and can be applied to any similar dataset.
first_indexed	2024-12-10T06:50:13Z
format	Article
id	doaj.art-a4e3cee8ebfc44b589f161a054d270a4
institution	Directory Open Access Journal
issn	1015-7999 2313-7835
language	English
last_indexed	2024-12-10T06:50:13Z
publishDate	2014-06-01
publisher	South African Institute of Computer Scientists and Information Technologists
record_format	Article
series	South African Computer Journal
spelling	doaj.art-a4e3cee8ebfc44b589f161a054d270a42022-12-22T01:58:33ZengSouth African Institute of Computer Scientists and Information TechnologistsSouth African Computer Journal1015-79992313-78352014-06-0105290Extracting salient features for network intrusion detection using machine learning methodsRalf C. StaudemeyerChristian W. OmlinThis work presents a data preprocessing and feature selection framework to support data mining and network security experts in minimal feature set selection of intrusion detection data. This process is supported by detailed visualisation and examination of class distributions. Distribution histograms, scatter plots and information gain are presented as supportive feature reduction tools. The feature reduction process applied is based on decision tree pruning and backward elimination. This paper starts with an analysis of the KDD Cup '99 datasets and their potential for feature reduction. The dataset consists of connection records with 41 features whose relevance for intrusion detection are not clear. All traffic is either classified `normal' or into the four attack types denial-of-service, network probe, remote-to-local or user-to-root. Using our custom feature selection process, we show how we can significantly reduce the number features in the dataset to a few salient features. We conclude by presenting minimal sets with 4--8 salient features for two-class and multi-class categorisation for detecting intrusions, as well as for the detection of individual attack classes; the performance using a static classifier compares favourably to the performance using all features available. The suggested process is of general nature and can be applied to any similar dataset.http://sacj.cs.uct.ac.za/index.php/sacj/article/view/200network intrusion detection: feature selectionmachine learningdecision trees
spellingShingle	Ralf C. Staudemeyer Christian W. Omlin Extracting salient features for network intrusion detection using machine learning methods South African Computer Journal network intrusion detection: feature selection machine learning decision trees
title	Extracting salient features for network intrusion detection using machine learning methods
title_full	Extracting salient features for network intrusion detection using machine learning methods
title_fullStr	Extracting salient features for network intrusion detection using machine learning methods
title_full_unstemmed	Extracting salient features for network intrusion detection using machine learning methods
title_short	Extracting salient features for network intrusion detection using machine learning methods
title_sort	extracting salient features for network intrusion detection using machine learning methods
topic	network intrusion detection: feature selection machine learning decision trees
url	http://sacj.cs.uct.ac.za/index.php/sacj/article/view/200
work_keys_str_mv	AT ralfcstaudemeyer extractingsalientfeaturesfornetworkintrusiondetectionusingmachinelearningmethods AT christianwomlin extractingsalientfeaturesfornetworkintrusiondetectionusingmachinelearningmethods

Extracting salient features for network intrusion detection using machine learning methods

Similar Items