BERT-Log: Anomaly Detection for System Logs Based on Pre-trained Language Model

Logs are primary information resource for fault diagnosis and anomaly detection in large-scale computer systems, but it is hard to classify anomalies from system logs. Recent studies focus on extracting semantic information from unstructured log messages and converting it into word vectors. Therefor...

Full description

Bibliographic Details
Main Authors:	Song Chen, Hai Liao
Format:	Article
Language:	English
Published:	Taylor & Francis Group 2022-12-01
Series:	Applied Artificial Intelligence
Online Access:	http://dx.doi.org/10.1080/08839514.2022.2145642

_version_	1797641061197152256
author	Song Chen Hai Liao
author_facet	Song Chen Hai Liao
author_sort	Song Chen
collection	DOAJ
description	Logs are primary information resource for fault diagnosis and anomaly detection in large-scale computer systems, but it is hard to classify anomalies from system logs. Recent studies focus on extracting semantic information from unstructured log messages and converting it into word vectors. Therefore, LSTM approach is more suitable for time series data. Word2Vec is the up-to-date encoding method, but the order of words in sequences is not taken into account. In this article, we propose BERT-Log, which regards the log sequence as a natural language sequence, use pre-trained language model to learn the semantic representation of normal and anomalous logs, and a fully connected neural network is utilized to fine-tune the BERT model to detect abnormal. It can capture all the semantic information from log sequence including context and position. It has achieved the highest performance among all the methods on HDFS dataset, with an F1-score of 99.3%. We propose a new log feature extractor on BGL dataset to obtain log sequence by sliding window including node ID, window size and step size. BERT-Log approach detects anomalies on BGL dataset with an F1-score of 99.4%. It gives 19% performance improvement compared to LogRobust and 7% performance improvement compared to HitAnomaly.
first_indexed	2024-03-11T13:40:05Z
format	Article
id	doaj.art-25d5511e07e94635a4a9b525faf4aa8e
institution	Directory Open Access Journal
issn	0883-9514 1087-6545
language	English
last_indexed	2024-03-11T13:40:05Z
publishDate	2022-12-01
publisher	Taylor & Francis Group
record_format	Article
series	Applied Artificial Intelligence
spelling	doaj.art-25d5511e07e94635a4a9b525faf4aa8e2023-11-02T13:36:39ZengTaylor & Francis GroupApplied Artificial Intelligence0883-95141087-65452022-12-0136110.1080/08839514.2022.21456422145642BERT-Log: Anomaly Detection for System Logs Based on Pre-trained Language ModelSong Chen0Hai Liao1Chengdu Technological UniversitySichuan Vocational College of Information TechnologyLogs are primary information resource for fault diagnosis and anomaly detection in large-scale computer systems, but it is hard to classify anomalies from system logs. Recent studies focus on extracting semantic information from unstructured log messages and converting it into word vectors. Therefore, LSTM approach is more suitable for time series data. Word2Vec is the up-to-date encoding method, but the order of words in sequences is not taken into account. In this article, we propose BERT-Log, which regards the log sequence as a natural language sequence, use pre-trained language model to learn the semantic representation of normal and anomalous logs, and a fully connected neural network is utilized to fine-tune the BERT model to detect abnormal. It can capture all the semantic information from log sequence including context and position. It has achieved the highest performance among all the methods on HDFS dataset, with an F1-score of 99.3%. We propose a new log feature extractor on BGL dataset to obtain log sequence by sliding window including node ID, window size and step size. BERT-Log approach detects anomalies on BGL dataset with an F1-score of 99.4%. It gives 19% performance improvement compared to LogRobust and 7% performance improvement compared to HitAnomaly.http://dx.doi.org/10.1080/08839514.2022.2145642
spellingShingle	Song Chen Hai Liao BERT-Log: Anomaly Detection for System Logs Based on Pre-trained Language Model Applied Artificial Intelligence
title	BERT-Log: Anomaly Detection for System Logs Based on Pre-trained Language Model
title_full	BERT-Log: Anomaly Detection for System Logs Based on Pre-trained Language Model
title_fullStr	BERT-Log: Anomaly Detection for System Logs Based on Pre-trained Language Model
title_full_unstemmed	BERT-Log: Anomaly Detection for System Logs Based on Pre-trained Language Model
title_short	BERT-Log: Anomaly Detection for System Logs Based on Pre-trained Language Model
title_sort	bert log anomaly detection for system logs based on pre trained language model
url	http://dx.doi.org/10.1080/08839514.2022.2145642
work_keys_str_mv	AT songchen bertloganomalydetectionforsystemlogsbasedonpretrainedlanguagemodel AT hailiao bertloganomalydetectionforsystemlogsbasedonpretrainedlanguagemodel

BERT-Log: Anomaly Detection for System Logs Based on Pre-trained Language Model

Similar Items