Summary: | Logs are generated by systems to record the detailed runtime information about system operations, and log analysis plays an important role in anomaly detection at the host or network level. Most existing detection methods require a priori knowledge, which cannot be used to detect the new or unknown anomalies. Moreover, the growing volume of logs poses new challenges to anomaly detection. In this paper, we propose an integrated method using K-prototype clustering and k-NN classification algorithms, which uses a novel clustering-filtering-refinement framework to perform anomaly detection from massive logs. First, we analyze the characteristics of system logs and extract 10 features based on the session information to characterize user behaviors effectively. Second, based on these extracted features, the K-prototype clustering algorithm is applied to partition the data set into different clusters. Then, the obvious normal events which usually present as highly coherent clusters are filtered out, and the others are regarded as anomaly candidates for further analysis. Finally, we design two new distance-based features to measure the local and global anomaly degrees for these anomaly candidates. Based on these two new features, we apply the k-NN classifier to generate accurate detection results. To verify the integrated method, we constructed a log collection and anomaly detection platform in the campus network center of Xi'an Jiaotong University. The experimental results based on the data sets collected from the platform show our method has high detection accuracy and low computational complexity.
|