GPT and Interpolation-Based Data Augmentation for Multiclass Intrusion Detection in IIoT
The absence of essential security protocols in Industrial Internet of Things (IIoT) networks introduces cybersecurity vulnerabilities and turns them into potential targets for various attack types. Although machine learning has been used for intrusion detection in the IIoT, datasets with representat...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2024-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10418592/ |
_version_ | 1797316054196682752 |
---|---|
author | Francisco S. Melicias Tiago F. R. Ribeiro Carlos Rabadao Leonel Santos Rogerio Luis De C. Costa |
author_facet | Francisco S. Melicias Tiago F. R. Ribeiro Carlos Rabadao Leonel Santos Rogerio Luis De C. Costa |
author_sort | Francisco S. Melicias |
collection | DOAJ |
description | The absence of essential security protocols in Industrial Internet of Things (IIoT) networks introduces cybersecurity vulnerabilities and turns them into potential targets for various attack types. Although machine learning has been used for intrusion detection in the IIoT, datasets with representative data of common attacks of IIoT network traffic are limited and often imbalanced. Data augmentation techniques address these problems by creating artificial data in classes with fewer samples. In this work, we evaluate the use of data augmentation when training intrusion detection models based on IIoT traffic data. We compare Generative Pre-trained Transformers (GPT) and variations on the Synthetic Minority Over-sampling TEchnique (SMOTE) and evaluate their capability to enhance intrusion detection performance. We examine the performance of five intrusion detection algorithms when trained with augmented datasets to models trained with the original non-augmented dataset. To ensure a fair comparison, we evaluated the algorithms’ performance in the different scenarios using the same test dataset, which does not contain synthetic data. The results show the need for a systematic evaluation before employing data augmentation, as its impact on classification performance depends on the algorithm, data, and used technique. While deep neural networks benefit from data augmentation, the eXtreme Gradient Boosting (XGBoost), which achieved superior performance in intrusion detection between all evaluated classifiers (with F1-Score over 91%), didn’t have any performance improvement when trained with augmented data. The evaluation of data generated by GPT-based methods shows such methods (especially GReaT) generate invalid data for both numerical and categorical features in a way that leads to performance degradation in multiclass classification. |
first_indexed | 2024-03-08T03:13:45Z |
format | Article |
id | doaj.art-67298056701b4713b5999aeced87d686 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-03-08T03:13:45Z |
publishDate | 2024-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-67298056701b4713b5999aeced87d6862024-02-13T00:00:52ZengIEEEIEEE Access2169-35362024-01-0112179451796510.1109/ACCESS.2024.336087910418592GPT and Interpolation-Based Data Augmentation for Multiclass Intrusion Detection in IIoTFrancisco S. Melicias0Tiago F. R. Ribeiro1https://orcid.org/0000-0003-1603-1218Carlos Rabadao2https://orcid.org/0000-0001-7332-4397Leonel Santos3https://orcid.org/0000-0002-6883-7996Rogerio Luis De C. Costa4https://orcid.org/0000-0003-2306-7585CIIC, ESTG, Polytechnic of Leiria, Leiria, PortugalCIIC, ESTG, Polytechnic of Leiria, Leiria, PortugalCIIC, ESTG, Polytechnic of Leiria, Leiria, PortugalCIIC, ESTG, Polytechnic of Leiria, Leiria, PortugalCIIC, Polytechnic of Leiria, Leiria, PortugalThe absence of essential security protocols in Industrial Internet of Things (IIoT) networks introduces cybersecurity vulnerabilities and turns them into potential targets for various attack types. Although machine learning has been used for intrusion detection in the IIoT, datasets with representative data of common attacks of IIoT network traffic are limited and often imbalanced. Data augmentation techniques address these problems by creating artificial data in classes with fewer samples. In this work, we evaluate the use of data augmentation when training intrusion detection models based on IIoT traffic data. We compare Generative Pre-trained Transformers (GPT) and variations on the Synthetic Minority Over-sampling TEchnique (SMOTE) and evaluate their capability to enhance intrusion detection performance. We examine the performance of five intrusion detection algorithms when trained with augmented datasets to models trained with the original non-augmented dataset. To ensure a fair comparison, we evaluated the algorithms’ performance in the different scenarios using the same test dataset, which does not contain synthetic data. The results show the need for a systematic evaluation before employing data augmentation, as its impact on classification performance depends on the algorithm, data, and used technique. While deep neural networks benefit from data augmentation, the eXtreme Gradient Boosting (XGBoost), which achieved superior performance in intrusion detection between all evaluated classifiers (with F1-Score over 91%), didn’t have any performance improvement when trained with augmented data. The evaluation of data generated by GPT-based methods shows such methods (especially GReaT) generate invalid data for both numerical and categorical features in a way that leads to performance degradation in multiclass classification.https://ieeexplore.ieee.org/document/10418592/IIoTcybersecuritydata augmentationmachine learning |
spellingShingle | Francisco S. Melicias Tiago F. R. Ribeiro Carlos Rabadao Leonel Santos Rogerio Luis De C. Costa GPT and Interpolation-Based Data Augmentation for Multiclass Intrusion Detection in IIoT IEEE Access IIoT cybersecurity data augmentation machine learning |
title | GPT and Interpolation-Based Data Augmentation for Multiclass Intrusion Detection in IIoT |
title_full | GPT and Interpolation-Based Data Augmentation for Multiclass Intrusion Detection in IIoT |
title_fullStr | GPT and Interpolation-Based Data Augmentation for Multiclass Intrusion Detection in IIoT |
title_full_unstemmed | GPT and Interpolation-Based Data Augmentation for Multiclass Intrusion Detection in IIoT |
title_short | GPT and Interpolation-Based Data Augmentation for Multiclass Intrusion Detection in IIoT |
title_sort | gpt and interpolation based data augmentation for multiclass intrusion detection in iiot |
topic | IIoT cybersecurity data augmentation machine learning |
url | https://ieeexplore.ieee.org/document/10418592/ |
work_keys_str_mv | AT franciscosmelicias gptandinterpolationbaseddataaugmentationformulticlassintrusiondetectioniniiot AT tiagofrribeiro gptandinterpolationbaseddataaugmentationformulticlassintrusiondetectioniniiot AT carlosrabadao gptandinterpolationbaseddataaugmentationformulticlassintrusiondetectioniniiot AT leonelsantos gptandinterpolationbaseddataaugmentationformulticlassintrusiondetectioniniiot AT rogerioluisdeccosta gptandinterpolationbaseddataaugmentationformulticlassintrusiondetectioniniiot |