Clustering-based label estimation for network anomaly detection

A substantial body of work has been done to identify network anomalies using supervised and unsupervised learning techniques with their unique strengths and weaknesses. In this work, we propose a new approach that takes advantage of both worlds of unsupervised and supervised learnings. The main obje...

Full description

Bibliographic Details
Main Authors: Sunhee Baek, Donghwoon Kwon, Sang C. Suh, Hyunjoo Kim, Ikkyun Kim, Jinoh Kim
Format: Article
Language:English
Published: KeAi Communications Co., Ltd. 2021-02-01
Series:Digital Communications and Networks
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2352864818301779
_version_ 1818959313083826176
author Sunhee Baek
Donghwoon Kwon
Sang C. Suh
Hyunjoo Kim
Ikkyun Kim
Jinoh Kim
author_facet Sunhee Baek
Donghwoon Kwon
Sang C. Suh
Hyunjoo Kim
Ikkyun Kim
Jinoh Kim
author_sort Sunhee Baek
collection DOAJ
description A substantial body of work has been done to identify network anomalies using supervised and unsupervised learning techniques with their unique strengths and weaknesses. In this work, we propose a new approach that takes advantage of both worlds of unsupervised and supervised learnings. The main objective of the proposed approach is to enable supervised anomaly detection without the provision of the associated labels by users. To this end, we estimate the labels of each connection in the training phase using clustering. The “estimated” labels are then utilized to establish a supervised learning model for the subsequent classification of connections in the testing stage. We set up a new property that defines anomalies in the context of network anomaly detection to improve the quality of estimated labels. Through our extensive experiments with a public dataset (NSL-KDD), we will prove that the proposed method can achieve performance comparable to one with the “original” labels provided in the dataset. We also introduce two heuristic functions that minimize the impact of the randomness of clustering to improve the overall quality of the estimated labels.
first_indexed 2024-12-20T11:39:39Z
format Article
id doaj.art-c9a0d392f6364502b42a4d8b98d4c5d9
institution Directory Open Access Journal
issn 2352-8648
language English
last_indexed 2024-12-20T11:39:39Z
publishDate 2021-02-01
publisher KeAi Communications Co., Ltd.
record_format Article
series Digital Communications and Networks
spelling doaj.art-c9a0d392f6364502b42a4d8b98d4c5d92022-12-21T19:42:01ZengKeAi Communications Co., Ltd.Digital Communications and Networks2352-86482021-02-01713744Clustering-based label estimation for network anomaly detectionSunhee Baek0Donghwoon Kwon1Sang C. Suh2Hyunjoo Kim3Ikkyun Kim4Jinoh Kim5Computer Science Department, Texas A&M University, Commerce, TX, 75429, USAComputer Science Department, Texas A&M University, Commerce, TX, 75429, USAComputer Science Department, Texas A&M University, Commerce, TX, 75429, USAETRI, 218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, Republic of KoreaETRI, 218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, Republic of KoreaComputer Science Department, Texas A&M University, Commerce, TX, 75429, USA; Corresponding author.A substantial body of work has been done to identify network anomalies using supervised and unsupervised learning techniques with their unique strengths and weaknesses. In this work, we propose a new approach that takes advantage of both worlds of unsupervised and supervised learnings. The main objective of the proposed approach is to enable supervised anomaly detection without the provision of the associated labels by users. To this end, we estimate the labels of each connection in the training phase using clustering. The “estimated” labels are then utilized to establish a supervised learning model for the subsequent classification of connections in the testing stage. We set up a new property that defines anomalies in the context of network anomaly detection to improve the quality of estimated labels. Through our extensive experiments with a public dataset (NSL-KDD), we will prove that the proposed method can achieve performance comparable to one with the “original” labels provided in the dataset. We also introduce two heuristic functions that minimize the impact of the randomness of clustering to improve the overall quality of the estimated labels.http://www.sciencedirect.com/science/article/pii/S2352864818301779Label estimationNetwork anomaly detectionClustering randomness
spellingShingle Sunhee Baek
Donghwoon Kwon
Sang C. Suh
Hyunjoo Kim
Ikkyun Kim
Jinoh Kim
Clustering-based label estimation for network anomaly detection
Digital Communications and Networks
Label estimation
Network anomaly detection
Clustering randomness
title Clustering-based label estimation for network anomaly detection
title_full Clustering-based label estimation for network anomaly detection
title_fullStr Clustering-based label estimation for network anomaly detection
title_full_unstemmed Clustering-based label estimation for network anomaly detection
title_short Clustering-based label estimation for network anomaly detection
title_sort clustering based label estimation for network anomaly detection
topic Label estimation
Network anomaly detection
Clustering randomness
url http://www.sciencedirect.com/science/article/pii/S2352864818301779
work_keys_str_mv AT sunheebaek clusteringbasedlabelestimationfornetworkanomalydetection
AT donghwoonkwon clusteringbasedlabelestimationfornetworkanomalydetection
AT sangcsuh clusteringbasedlabelestimationfornetworkanomalydetection
AT hyunjookim clusteringbasedlabelestimationfornetworkanomalydetection
AT ikkyunkim clusteringbasedlabelestimationfornetworkanomalydetection
AT jinohkim clusteringbasedlabelestimationfornetworkanomalydetection