Clustering-based label estimation for network anomaly detection
A substantial body of work has been done to identify network anomalies using supervised and unsupervised learning techniques with their unique strengths and weaknesses. In this work, we propose a new approach that takes advantage of both worlds of unsupervised and supervised learnings. The main obje...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
KeAi Communications Co., Ltd.
2021-02-01
|
Series: | Digital Communications and Networks |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2352864818301779 |
_version_ | 1818959313083826176 |
---|---|
author | Sunhee Baek Donghwoon Kwon Sang C. Suh Hyunjoo Kim Ikkyun Kim Jinoh Kim |
author_facet | Sunhee Baek Donghwoon Kwon Sang C. Suh Hyunjoo Kim Ikkyun Kim Jinoh Kim |
author_sort | Sunhee Baek |
collection | DOAJ |
description | A substantial body of work has been done to identify network anomalies using supervised and unsupervised learning techniques with their unique strengths and weaknesses. In this work, we propose a new approach that takes advantage of both worlds of unsupervised and supervised learnings. The main objective of the proposed approach is to enable supervised anomaly detection without the provision of the associated labels by users. To this end, we estimate the labels of each connection in the training phase using clustering. The “estimated” labels are then utilized to establish a supervised learning model for the subsequent classification of connections in the testing stage. We set up a new property that defines anomalies in the context of network anomaly detection to improve the quality of estimated labels. Through our extensive experiments with a public dataset (NSL-KDD), we will prove that the proposed method can achieve performance comparable to one with the “original” labels provided in the dataset. We also introduce two heuristic functions that minimize the impact of the randomness of clustering to improve the overall quality of the estimated labels. |
first_indexed | 2024-12-20T11:39:39Z |
format | Article |
id | doaj.art-c9a0d392f6364502b42a4d8b98d4c5d9 |
institution | Directory Open Access Journal |
issn | 2352-8648 |
language | English |
last_indexed | 2024-12-20T11:39:39Z |
publishDate | 2021-02-01 |
publisher | KeAi Communications Co., Ltd. |
record_format | Article |
series | Digital Communications and Networks |
spelling | doaj.art-c9a0d392f6364502b42a4d8b98d4c5d92022-12-21T19:42:01ZengKeAi Communications Co., Ltd.Digital Communications and Networks2352-86482021-02-01713744Clustering-based label estimation for network anomaly detectionSunhee Baek0Donghwoon Kwon1Sang C. Suh2Hyunjoo Kim3Ikkyun Kim4Jinoh Kim5Computer Science Department, Texas A&M University, Commerce, TX, 75429, USAComputer Science Department, Texas A&M University, Commerce, TX, 75429, USAComputer Science Department, Texas A&M University, Commerce, TX, 75429, USAETRI, 218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, Republic of KoreaETRI, 218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, Republic of KoreaComputer Science Department, Texas A&M University, Commerce, TX, 75429, USA; Corresponding author.A substantial body of work has been done to identify network anomalies using supervised and unsupervised learning techniques with their unique strengths and weaknesses. In this work, we propose a new approach that takes advantage of both worlds of unsupervised and supervised learnings. The main objective of the proposed approach is to enable supervised anomaly detection without the provision of the associated labels by users. To this end, we estimate the labels of each connection in the training phase using clustering. The “estimated” labels are then utilized to establish a supervised learning model for the subsequent classification of connections in the testing stage. We set up a new property that defines anomalies in the context of network anomaly detection to improve the quality of estimated labels. Through our extensive experiments with a public dataset (NSL-KDD), we will prove that the proposed method can achieve performance comparable to one with the “original” labels provided in the dataset. We also introduce two heuristic functions that minimize the impact of the randomness of clustering to improve the overall quality of the estimated labels.http://www.sciencedirect.com/science/article/pii/S2352864818301779Label estimationNetwork anomaly detectionClustering randomness |
spellingShingle | Sunhee Baek Donghwoon Kwon Sang C. Suh Hyunjoo Kim Ikkyun Kim Jinoh Kim Clustering-based label estimation for network anomaly detection Digital Communications and Networks Label estimation Network anomaly detection Clustering randomness |
title | Clustering-based label estimation for network anomaly detection |
title_full | Clustering-based label estimation for network anomaly detection |
title_fullStr | Clustering-based label estimation for network anomaly detection |
title_full_unstemmed | Clustering-based label estimation for network anomaly detection |
title_short | Clustering-based label estimation for network anomaly detection |
title_sort | clustering based label estimation for network anomaly detection |
topic | Label estimation Network anomaly detection Clustering randomness |
url | http://www.sciencedirect.com/science/article/pii/S2352864818301779 |
work_keys_str_mv | AT sunheebaek clusteringbasedlabelestimationfornetworkanomalydetection AT donghwoonkwon clusteringbasedlabelestimationfornetworkanomalydetection AT sangcsuh clusteringbasedlabelestimationfornetworkanomalydetection AT hyunjookim clusteringbasedlabelestimationfornetworkanomalydetection AT ikkyunkim clusteringbasedlabelestimationfornetworkanomalydetection AT jinohkim clusteringbasedlabelestimationfornetworkanomalydetection |