Intrusion Detection Based on Sequential Information Preserving Log Embedding Methods and Anomaly Detection Algorithms

Previous methods for system intrusion detection have mainly consisted of those based on pattern matching that employs prior knowledge extracted from experts’ domain knowledge. However, pattern matching-based methods have a major drawback that it can be bypassed through various modified te...

Full description

Bibliographic Details
Main Authors: Czangyeob Kim, Myeongjun Jang, Seungwan Seo, Kyeongchan Park, Pilsung Kang
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9399070/
_version_ 1818445629625466880
author Czangyeob Kim
Myeongjun Jang
Seungwan Seo
Kyeongchan Park
Pilsung Kang
author_facet Czangyeob Kim
Myeongjun Jang
Seungwan Seo
Kyeongchan Park
Pilsung Kang
author_sort Czangyeob Kim
collection DOAJ
description Previous methods for system intrusion detection have mainly consisted of those based on pattern matching that employs prior knowledge extracted from experts’ domain knowledge. However, pattern matching-based methods have a major drawback that it can be bypassed through various modified techniques. These advanced persistent threats cause limitation to the pattern matching-based detecting mechanism, because they are not only more sophisticated than usual threats but also specialized in the targeted attacking object. The defense mechanism should have to comprehend unusual phenomenons or behaviors to successfully handles the advanced threats. To achieve this, various security techniques based on machine learning have been developed recently. Among these, anomaly detection algorithms, which are trained in unsupervised fashion, are capable of reducing efforts of security experts and securing labeled dataset through post analysis. It is further possible to distinguish abnormal behaviors more precisely by training classification models if sufficient amounts of labeled dataset is obtained through post analysis of anomaly detection results. In this study, we proposed an end-to-end abnormal behavior detection method based on sequential information preserving log embedding algorithms and machine learning-based anomaly detection algorithms. Contrary to other machine learning based system anomaly detection models, which borrow domain experts’ knowledge to extract significant features from the log data, raw log data are transformed into a fixed size of continuous vector regardless of their length, and these vectors are used to train the anomaly detection models. Experimental results based on a real system call trace dataset, our proposed log embedding method with unsupervised anomaly detection model yielded a favorable performance, at most 0.8708 in terms of AUROC, and it can be further improved up to 0.9745 with supervised classification algorithms if sufficient labeled attack log data become available.
first_indexed 2024-12-14T19:34:52Z
format Article
id doaj.art-019f5acb19e943148b2bd622f276c242
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-14T19:34:52Z
publishDate 2021-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-019f5acb19e943148b2bd622f276c2422022-12-21T22:49:55ZengIEEEIEEE Access2169-35362021-01-019580885810110.1109/ACCESS.2021.30717639399070Intrusion Detection Based on Sequential Information Preserving Log Embedding Methods and Anomaly Detection AlgorithmsCzangyeob Kim0https://orcid.org/0000-0002-9784-2399Myeongjun Jang1https://orcid.org/0000-0002-9352-4799Seungwan Seo2https://orcid.org/0000-0001-5204-3350Kyeongchan Park3Pilsung Kang4https://orcid.org/0000-0001-7663-3937School of Industrial Management Engineering, Korea University, Seoul, Republic of KoreaDepartment of Computer Science, University of Oxford, Oxford, U.KSchool of Industrial Management Engineering, Korea University, Seoul, Republic of KoreaSchool of Industrial Management Engineering, Korea University, Seoul, Republic of KoreaSchool of Industrial Management Engineering, Korea University, Seoul, Republic of KoreaPrevious methods for system intrusion detection have mainly consisted of those based on pattern matching that employs prior knowledge extracted from experts’ domain knowledge. However, pattern matching-based methods have a major drawback that it can be bypassed through various modified techniques. These advanced persistent threats cause limitation to the pattern matching-based detecting mechanism, because they are not only more sophisticated than usual threats but also specialized in the targeted attacking object. The defense mechanism should have to comprehend unusual phenomenons or behaviors to successfully handles the advanced threats. To achieve this, various security techniques based on machine learning have been developed recently. Among these, anomaly detection algorithms, which are trained in unsupervised fashion, are capable of reducing efforts of security experts and securing labeled dataset through post analysis. It is further possible to distinguish abnormal behaviors more precisely by training classification models if sufficient amounts of labeled dataset is obtained through post analysis of anomaly detection results. In this study, we proposed an end-to-end abnormal behavior detection method based on sequential information preserving log embedding algorithms and machine learning-based anomaly detection algorithms. Contrary to other machine learning based system anomaly detection models, which borrow domain experts’ knowledge to extract significant features from the log data, raw log data are transformed into a fixed size of continuous vector regardless of their length, and these vectors are used to train the anomaly detection models. Experimental results based on a real system call trace dataset, our proposed log embedding method with unsupervised anomaly detection model yielded a favorable performance, at most 0.8708 in terms of AUROC, and it can be further improved up to 0.9745 with supervised classification algorithms if sufficient labeled attack log data become available.https://ieeexplore.ieee.org/document/9399070/System anomaly detectioncyber securitysystem log embeddingadvanced persistent threatADFA-LD
spellingShingle Czangyeob Kim
Myeongjun Jang
Seungwan Seo
Kyeongchan Park
Pilsung Kang
Intrusion Detection Based on Sequential Information Preserving Log Embedding Methods and Anomaly Detection Algorithms
IEEE Access
System anomaly detection
cyber security
system log embedding
advanced persistent threat
ADFA-LD
title Intrusion Detection Based on Sequential Information Preserving Log Embedding Methods and Anomaly Detection Algorithms
title_full Intrusion Detection Based on Sequential Information Preserving Log Embedding Methods and Anomaly Detection Algorithms
title_fullStr Intrusion Detection Based on Sequential Information Preserving Log Embedding Methods and Anomaly Detection Algorithms
title_full_unstemmed Intrusion Detection Based on Sequential Information Preserving Log Embedding Methods and Anomaly Detection Algorithms
title_short Intrusion Detection Based on Sequential Information Preserving Log Embedding Methods and Anomaly Detection Algorithms
title_sort intrusion detection based on sequential information preserving log embedding methods and anomaly detection algorithms
topic System anomaly detection
cyber security
system log embedding
advanced persistent threat
ADFA-LD
url https://ieeexplore.ieee.org/document/9399070/
work_keys_str_mv AT czangyeobkim intrusiondetectionbasedonsequentialinformationpreservinglogembeddingmethodsandanomalydetectionalgorithms
AT myeongjunjang intrusiondetectionbasedonsequentialinformationpreservinglogembeddingmethodsandanomalydetectionalgorithms
AT seungwanseo intrusiondetectionbasedonsequentialinformationpreservinglogembeddingmethodsandanomalydetectionalgorithms
AT kyeongchanpark intrusiondetectionbasedonsequentialinformationpreservinglogembeddingmethodsandanomalydetectionalgorithms
AT pilsungkang intrusiondetectionbasedonsequentialinformationpreservinglogembeddingmethodsandanomalydetectionalgorithms