Generating Labeled Training Datasets Towards Unified Network Intrusion Detection Systems

It is crucial to implement innovative artificial intelligence (AI)-powered network intrusion detection systems (NIDSes) to protect enterprise networks from cyberattacks, which have recently become more diverse and sophisticated. High-quality labeled training datasets are required to train AI-powered...

Full description

Bibliographic Details
Main Authors:	Ryosuke Ishibashi, Kohei Miyamoto, Chansu Han, Tao Ban, Takeshi Takahashi, Jun'ichi Takeuchi
Format:	Article
Language:	English
Published:	IEEE 2022-01-01
Series:	IEEE Access
Subjects:	Network intrusion detection system security alert packet data analysis security data labeling public dataset packet replay
Online Access:	https://ieeexplore.ieee.org/document/9777676/

_version_	1818552205128499200
author	Ryosuke Ishibashi Kohei Miyamoto Chansu Han Tao Ban Takeshi Takahashi Jun'ichi Takeuchi
author_facet	Ryosuke Ishibashi Kohei Miyamoto Chansu Han Tao Ban Takeshi Takahashi Jun'ichi Takeuchi
author_sort	Ryosuke Ishibashi
collection	DOAJ
description	It is crucial to implement innovative artificial intelligence (AI)-powered network intrusion detection systems (NIDSes) to protect enterprise networks from cyberattacks, which have recently become more diverse and sophisticated. High-quality labeled training datasets are required to train AI-powered NIDSes; such datasets are globally scarce, and generating new training datasets is considered cumbersome. In this study, we investigate the possibility of an approach that integrates the strengths of existing security appliances to generate labeled training datasets that can be leveraged to develop brand-new AI-powered cybersecurity solutions. We begin by locating communication flows that the deployed NIDSes detect as suspicious, investigating their causal factors, and assigning appropriate labels in a universal format. Then, we output the packet data in the identified communication flows and the corresponding alert-type labels as labeled data. We demonstrate the effectiveness of the labeling scheme by evaluating classification models trained with the labeled dataset we generated. Furthermore, we provide case studies to examine the performance of several commonly used NIDSes and on practical approaches to automating the security triage process. Labeled datasets in this study are generated using public datasets and open-source NIDSes to ensure the reproducibility of the results. The datasets and the software tools are made publicly accessible for research use.
first_indexed	2024-12-12T09:10:09Z
format	Article
id	doaj.art-ca5b571c8e454d2bb3bb7aa5b73b4eb6
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-12-12T09:10:09Z
publishDate	2022-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-ca5b571c8e454d2bb3bb7aa5b73b4eb62022-12-22T00:29:32ZengIEEEIEEE Access2169-35362022-01-0110539725398610.1109/ACCESS.2022.31760989777676Generating Labeled Training Datasets Towards Unified Network Intrusion Detection SystemsRyosuke Ishibashi0Kohei Miyamoto1https://orcid.org/0000-0002-0977-4155Chansu Han2https://orcid.org/0000-0002-1728-5300Tao Ban3https://orcid.org/0000-0002-9616-3212Takeshi Takahashi4https://orcid.org/0000-0002-6477-7770Jun'ichi Takeuchi5https://orcid.org/0000-0002-5819-3082Graduate School and Faculty of Information Science and Electrical Engineering, Kyushu University, Fukuoka, JapanGraduate School and Faculty of Information Science and Electrical Engineering, Kyushu University, Fukuoka, JapanNational Institute of Information and Communications Technology, Koganei, Tokyo, JapanNational Institute of Information and Communications Technology, Koganei, Tokyo, JapanNational Institute of Information and Communications Technology, Koganei, Tokyo, JapanGraduate School and Faculty of Information Science and Electrical Engineering, Kyushu University, Fukuoka, JapanIt is crucial to implement innovative artificial intelligence (AI)-powered network intrusion detection systems (NIDSes) to protect enterprise networks from cyberattacks, which have recently become more diverse and sophisticated. High-quality labeled training datasets are required to train AI-powered NIDSes; such datasets are globally scarce, and generating new training datasets is considered cumbersome. In this study, we investigate the possibility of an approach that integrates the strengths of existing security appliances to generate labeled training datasets that can be leveraged to develop brand-new AI-powered cybersecurity solutions. We begin by locating communication flows that the deployed NIDSes detect as suspicious, investigating their causal factors, and assigning appropriate labels in a universal format. Then, we output the packet data in the identified communication flows and the corresponding alert-type labels as labeled data. We demonstrate the effectiveness of the labeling scheme by evaluating classification models trained with the labeled dataset we generated. Furthermore, we provide case studies to examine the performance of several commonly used NIDSes and on practical approaches to automating the security triage process. Labeled datasets in this study are generated using public datasets and open-source NIDSes to ensure the reproducibility of the results. The datasets and the software tools are made publicly accessible for research use.https://ieeexplore.ieee.org/document/9777676/Network intrusion detection systemsecurity alertpacket data analysissecurity data labelingpublic datasetpacket replay
spellingShingle	Ryosuke Ishibashi Kohei Miyamoto Chansu Han Tao Ban Takeshi Takahashi Jun'ichi Takeuchi Generating Labeled Training Datasets Towards Unified Network Intrusion Detection Systems IEEE Access Network intrusion detection system security alert packet data analysis security data labeling public dataset packet replay
title	Generating Labeled Training Datasets Towards Unified Network Intrusion Detection Systems
title_full	Generating Labeled Training Datasets Towards Unified Network Intrusion Detection Systems
title_fullStr	Generating Labeled Training Datasets Towards Unified Network Intrusion Detection Systems
title_full_unstemmed	Generating Labeled Training Datasets Towards Unified Network Intrusion Detection Systems
title_short	Generating Labeled Training Datasets Towards Unified Network Intrusion Detection Systems
title_sort	generating labeled training datasets towards unified network intrusion detection systems
topic	Network intrusion detection system security alert packet data analysis security data labeling public dataset packet replay
url	https://ieeexplore.ieee.org/document/9777676/
work_keys_str_mv	AT ryosukeishibashi generatinglabeledtrainingdatasetstowardsunifiednetworkintrusiondetectionsystems AT koheimiyamoto generatinglabeledtrainingdatasetstowardsunifiednetworkintrusiondetectionsystems AT chansuhan generatinglabeledtrainingdatasetstowardsunifiednetworkintrusiondetectionsystems AT taoban generatinglabeledtrainingdatasetstowardsunifiednetworkintrusiondetectionsystems AT takeshitakahashi generatinglabeledtrainingdatasetstowardsunifiednetworkintrusiondetectionsystems AT junichitakeuchi generatinglabeledtrainingdatasetstowardsunifiednetworkintrusiondetectionsystems

Generating Labeled Training Datasets Towards Unified Network Intrusion Detection Systems

Similar Items