BERT-Log: Anomaly Detection for System Logs Based on Pre-trained Language Model

Logs are primary information resource for fault diagnosis and anomaly detection in large-scale computer systems, but it is hard to classify anomalies from system logs. Recent studies focus on extracting semantic information from unstructured log messages and converting it into word vectors. Therefor...

Full description

Bibliographic Details
Main Authors: Song Chen, Hai Liao
Format: Article
Language:English
Published: Taylor & Francis Group 2022-12-01
Series:Applied Artificial Intelligence
Online Access:http://dx.doi.org/10.1080/08839514.2022.2145642
_version_ 1797641061197152256
author Song Chen
Hai Liao
author_facet Song Chen
Hai Liao
author_sort Song Chen
collection DOAJ
description Logs are primary information resource for fault diagnosis and anomaly detection in large-scale computer systems, but it is hard to classify anomalies from system logs. Recent studies focus on extracting semantic information from unstructured log messages and converting it into word vectors. Therefore, LSTM approach is more suitable for time series data. Word2Vec is the up-to-date encoding method, but the order of words in sequences is not taken into account. In this article, we propose BERT-Log, which regards the log sequence as a natural language sequence, use pre-trained language model to learn the semantic representation of normal and anomalous logs, and a fully connected neural network is utilized to fine-tune the BERT model to detect abnormal. It can capture all the semantic information from log sequence including context and position. It has achieved the highest performance among all the methods on HDFS dataset, with an F1-score of 99.3%. We propose a new log feature extractor on BGL dataset to obtain log sequence by sliding window including node ID, window size and step size. BERT-Log approach detects anomalies on BGL dataset with an F1-score of 99.4%. It gives 19% performance improvement compared to LogRobust and 7% performance improvement compared to HitAnomaly.
first_indexed 2024-03-11T13:40:05Z
format Article
id doaj.art-25d5511e07e94635a4a9b525faf4aa8e
institution Directory Open Access Journal
issn 0883-9514
1087-6545
language English
last_indexed 2024-03-11T13:40:05Z
publishDate 2022-12-01
publisher Taylor & Francis Group
record_format Article
series Applied Artificial Intelligence
spelling doaj.art-25d5511e07e94635a4a9b525faf4aa8e2023-11-02T13:36:39ZengTaylor & Francis GroupApplied Artificial Intelligence0883-95141087-65452022-12-0136110.1080/08839514.2022.21456422145642BERT-Log: Anomaly Detection for System Logs Based on Pre-trained Language ModelSong Chen0Hai Liao1Chengdu Technological UniversitySichuan Vocational College of Information TechnologyLogs are primary information resource for fault diagnosis and anomaly detection in large-scale computer systems, but it is hard to classify anomalies from system logs. Recent studies focus on extracting semantic information from unstructured log messages and converting it into word vectors. Therefore, LSTM approach is more suitable for time series data. Word2Vec is the up-to-date encoding method, but the order of words in sequences is not taken into account. In this article, we propose BERT-Log, which regards the log sequence as a natural language sequence, use pre-trained language model to learn the semantic representation of normal and anomalous logs, and a fully connected neural network is utilized to fine-tune the BERT model to detect abnormal. It can capture all the semantic information from log sequence including context and position. It has achieved the highest performance among all the methods on HDFS dataset, with an F1-score of 99.3%. We propose a new log feature extractor on BGL dataset to obtain log sequence by sliding window including node ID, window size and step size. BERT-Log approach detects anomalies on BGL dataset with an F1-score of 99.4%. It gives 19% performance improvement compared to LogRobust and 7% performance improvement compared to HitAnomaly.http://dx.doi.org/10.1080/08839514.2022.2145642
spellingShingle Song Chen
Hai Liao
BERT-Log: Anomaly Detection for System Logs Based on Pre-trained Language Model
Applied Artificial Intelligence
title BERT-Log: Anomaly Detection for System Logs Based on Pre-trained Language Model
title_full BERT-Log: Anomaly Detection for System Logs Based on Pre-trained Language Model
title_fullStr BERT-Log: Anomaly Detection for System Logs Based on Pre-trained Language Model
title_full_unstemmed BERT-Log: Anomaly Detection for System Logs Based on Pre-trained Language Model
title_short BERT-Log: Anomaly Detection for System Logs Based on Pre-trained Language Model
title_sort bert log anomaly detection for system logs based on pre trained language model
url http://dx.doi.org/10.1080/08839514.2022.2145642
work_keys_str_mv AT songchen bertloganomalydetectionforsystemlogsbasedonpretrainedlanguagemodel
AT hailiao bertloganomalydetectionforsystemlogsbasedonpretrainedlanguagemodel