An Integrated Method for Anomaly Detection From Massive System Logs

Logs are generated by systems to record the detailed runtime information about system operations, and log analysis plays an important role in anomaly detection at the host or network level. Most existing detection methods require a priori knowledge, which cannot be used to detect the new or unknown...

Full description

Bibliographic Details
Main Authors: Zhaoli Liu, Tao Qin, Xiaohong Guan, Hezhi Jiang, Chenxu Wang
Format: Article
Language:English
Published: IEEE 2018-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8371223/
_version_ 1819169208611635200
author Zhaoli Liu
Tao Qin
Xiaohong Guan
Hezhi Jiang
Chenxu Wang
author_facet Zhaoli Liu
Tao Qin
Xiaohong Guan
Hezhi Jiang
Chenxu Wang
author_sort Zhaoli Liu
collection DOAJ
description Logs are generated by systems to record the detailed runtime information about system operations, and log analysis plays an important role in anomaly detection at the host or network level. Most existing detection methods require a priori knowledge, which cannot be used to detect the new or unknown anomalies. Moreover, the growing volume of logs poses new challenges to anomaly detection. In this paper, we propose an integrated method using K-prototype clustering and k-NN classification algorithms, which uses a novel clustering-filtering-refinement framework to perform anomaly detection from massive logs. First, we analyze the characteristics of system logs and extract 10 features based on the session information to characterize user behaviors effectively. Second, based on these extracted features, the K-prototype clustering algorithm is applied to partition the data set into different clusters. Then, the obvious normal events which usually present as highly coherent clusters are filtered out, and the others are regarded as anomaly candidates for further analysis. Finally, we design two new distance-based features to measure the local and global anomaly degrees for these anomaly candidates. Based on these two new features, we apply the k-NN classifier to generate accurate detection results. To verify the integrated method, we constructed a log collection and anomaly detection platform in the campus network center of Xi'an Jiaotong University. The experimental results based on the data sets collected from the platform show our method has high detection accuracy and low computational complexity.
first_indexed 2024-12-22T19:15:51Z
format Article
id doaj.art-b0b0f348a59943cd8278b3e564cec1d4
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-22T19:15:51Z
publishDate 2018-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-b0b0f348a59943cd8278b3e564cec1d42022-12-21T18:15:31ZengIEEEIEEE Access2169-35362018-01-016306023061110.1109/ACCESS.2018.28433368371223An Integrated Method for Anomaly Detection From Massive System LogsZhaoli Liu0Tao Qin1https://orcid.org/0000-0003-4874-2567Xiaohong Guan2Hezhi Jiang3Chenxu Wang4Key Laboratory for Intelligent Networks and Network Security of the Ministry of Education, Xi’an Jiaotong University, Xi’an, ChinaKey Laboratory for Intelligent Networks and Network Security of the Ministry of Education, Xi’an Jiaotong University, Xi’an, ChinaKey Laboratory for Intelligent Networks and Network Security of the Ministry of Education, Xi’an Jiaotong University, Xi’an, ChinaKey Laboratory for Intelligent Networks and Network Security of the Ministry of Education, Xi’an Jiaotong University, Xi’an, ChinaKey Laboratory for Intelligent Networks and Network Security of the Ministry of Education, Xi’an Jiaotong University, Xi’an, ChinaLogs are generated by systems to record the detailed runtime information about system operations, and log analysis plays an important role in anomaly detection at the host or network level. Most existing detection methods require a priori knowledge, which cannot be used to detect the new or unknown anomalies. Moreover, the growing volume of logs poses new challenges to anomaly detection. In this paper, we propose an integrated method using K-prototype clustering and k-NN classification algorithms, which uses a novel clustering-filtering-refinement framework to perform anomaly detection from massive logs. First, we analyze the characteristics of system logs and extract 10 features based on the session information to characterize user behaviors effectively. Second, based on these extracted features, the K-prototype clustering algorithm is applied to partition the data set into different clusters. Then, the obvious normal events which usually present as highly coherent clusters are filtered out, and the others are regarded as anomaly candidates for further analysis. Finally, we design two new distance-based features to measure the local and global anomaly degrees for these anomaly candidates. Based on these two new features, we apply the k-NN classifier to generate accurate detection results. To verify the integrated method, we constructed a log collection and anomaly detection platform in the campus network center of Xi'an Jiaotong University. The experimental results based on the data sets collected from the platform show our method has high detection accuracy and low computational complexity.https://ieeexplore.ieee.org/document/8371223/Anomaly detectionclustering-filtering-refinementK-prototype clusteringk-NN classificationmassive logs
spellingShingle Zhaoli Liu
Tao Qin
Xiaohong Guan
Hezhi Jiang
Chenxu Wang
An Integrated Method for Anomaly Detection From Massive System Logs
IEEE Access
Anomaly detection
clustering-filtering-refinement
K-prototype clustering
k-NN classification
massive logs
title An Integrated Method for Anomaly Detection From Massive System Logs
title_full An Integrated Method for Anomaly Detection From Massive System Logs
title_fullStr An Integrated Method for Anomaly Detection From Massive System Logs
title_full_unstemmed An Integrated Method for Anomaly Detection From Massive System Logs
title_short An Integrated Method for Anomaly Detection From Massive System Logs
title_sort integrated method for anomaly detection from massive system logs
topic Anomaly detection
clustering-filtering-refinement
K-prototype clustering
k-NN classification
massive logs
url https://ieeexplore.ieee.org/document/8371223/
work_keys_str_mv AT zhaoliliu anintegratedmethodforanomalydetectionfrommassivesystemlogs
AT taoqin anintegratedmethodforanomalydetectionfrommassivesystemlogs
AT xiaohongguan anintegratedmethodforanomalydetectionfrommassivesystemlogs
AT hezhijiang anintegratedmethodforanomalydetectionfrommassivesystemlogs
AT chenxuwang anintegratedmethodforanomalydetectionfrommassivesystemlogs
AT zhaoliliu integratedmethodforanomalydetectionfrommassivesystemlogs
AT taoqin integratedmethodforanomalydetectionfrommassivesystemlogs
AT xiaohongguan integratedmethodforanomalydetectionfrommassivesystemlogs
AT hezhijiang integratedmethodforanomalydetectionfrommassivesystemlogs
AT chenxuwang integratedmethodforanomalydetectionfrommassivesystemlogs