BERT-Log: Anomaly Detection for System Logs Based on Pre-trained Language Model
Logs are primary information resource for fault diagnosis and anomaly detection in large-scale computer systems, but it is hard to classify anomalies from system logs. Recent studies focus on extracting semantic information from unstructured log messages and converting it into word vectors. Therefor...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Taylor & Francis Group
2022-12-01
|
Series: | Applied Artificial Intelligence |
Online Access: | http://dx.doi.org/10.1080/08839514.2022.2145642 |
_version_ | 1797641061197152256 |
---|---|
author | Song Chen Hai Liao |
author_facet | Song Chen Hai Liao |
author_sort | Song Chen |
collection | DOAJ |
description | Logs are primary information resource for fault diagnosis and anomaly detection in large-scale computer systems, but it is hard to classify anomalies from system logs. Recent studies focus on extracting semantic information from unstructured log messages and converting it into word vectors. Therefore, LSTM approach is more suitable for time series data. Word2Vec is the up-to-date encoding method, but the order of words in sequences is not taken into account. In this article, we propose BERT-Log, which regards the log sequence as a natural language sequence, use pre-trained language model to learn the semantic representation of normal and anomalous logs, and a fully connected neural network is utilized to fine-tune the BERT model to detect abnormal. It can capture all the semantic information from log sequence including context and position. It has achieved the highest performance among all the methods on HDFS dataset, with an F1-score of 99.3%. We propose a new log feature extractor on BGL dataset to obtain log sequence by sliding window including node ID, window size and step size. BERT-Log approach detects anomalies on BGL dataset with an F1-score of 99.4%. It gives 19% performance improvement compared to LogRobust and 7% performance improvement compared to HitAnomaly. |
first_indexed | 2024-03-11T13:40:05Z |
format | Article |
id | doaj.art-25d5511e07e94635a4a9b525faf4aa8e |
institution | Directory Open Access Journal |
issn | 0883-9514 1087-6545 |
language | English |
last_indexed | 2024-03-11T13:40:05Z |
publishDate | 2022-12-01 |
publisher | Taylor & Francis Group |
record_format | Article |
series | Applied Artificial Intelligence |
spelling | doaj.art-25d5511e07e94635a4a9b525faf4aa8e2023-11-02T13:36:39ZengTaylor & Francis GroupApplied Artificial Intelligence0883-95141087-65452022-12-0136110.1080/08839514.2022.21456422145642BERT-Log: Anomaly Detection for System Logs Based on Pre-trained Language ModelSong Chen0Hai Liao1Chengdu Technological UniversitySichuan Vocational College of Information TechnologyLogs are primary information resource for fault diagnosis and anomaly detection in large-scale computer systems, but it is hard to classify anomalies from system logs. Recent studies focus on extracting semantic information from unstructured log messages and converting it into word vectors. Therefore, LSTM approach is more suitable for time series data. Word2Vec is the up-to-date encoding method, but the order of words in sequences is not taken into account. In this article, we propose BERT-Log, which regards the log sequence as a natural language sequence, use pre-trained language model to learn the semantic representation of normal and anomalous logs, and a fully connected neural network is utilized to fine-tune the BERT model to detect abnormal. It can capture all the semantic information from log sequence including context and position. It has achieved the highest performance among all the methods on HDFS dataset, with an F1-score of 99.3%. We propose a new log feature extractor on BGL dataset to obtain log sequence by sliding window including node ID, window size and step size. BERT-Log approach detects anomalies on BGL dataset with an F1-score of 99.4%. It gives 19% performance improvement compared to LogRobust and 7% performance improvement compared to HitAnomaly.http://dx.doi.org/10.1080/08839514.2022.2145642 |
spellingShingle | Song Chen Hai Liao BERT-Log: Anomaly Detection for System Logs Based on Pre-trained Language Model Applied Artificial Intelligence |
title | BERT-Log: Anomaly Detection for System Logs Based on Pre-trained Language Model |
title_full | BERT-Log: Anomaly Detection for System Logs Based on Pre-trained Language Model |
title_fullStr | BERT-Log: Anomaly Detection for System Logs Based on Pre-trained Language Model |
title_full_unstemmed | BERT-Log: Anomaly Detection for System Logs Based on Pre-trained Language Model |
title_short | BERT-Log: Anomaly Detection for System Logs Based on Pre-trained Language Model |
title_sort | bert log anomaly detection for system logs based on pre trained language model |
url | http://dx.doi.org/10.1080/08839514.2022.2145642 |
work_keys_str_mv | AT songchen bertloganomalydetectionforsystemlogsbasedonpretrainedlanguagemodel AT hailiao bertloganomalydetectionforsystemlogsbasedonpretrainedlanguagemodel |