LogEvent2vec: LogEvent-to-Vector Based Anomaly Detection for Large-Scale Logs in Internet of Things

Log anomaly detection is an efficient method to manage modern large-scale Internet of Things (IoT) systems. More and more works start to apply natural language processing (NLP) methods, and in particular word2vec, in the log feature extraction. Word2vec can extract the relevance between words and ve...

Full description

Bibliographic Details
Main Authors: Jin Wang, Yangning Tang, Shiming He, Changqing Zhao, Pradip Kumar Sharma, Osama Alfarraj, Amr Tolba
Format: Article
Language:English
Published: MDPI AG 2020-04-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/20/9/2451
_version_ 1797569629791453184
author Jin Wang
Yangning Tang
Shiming He
Changqing Zhao
Pradip Kumar Sharma
Osama Alfarraj
Amr Tolba
author_facet Jin Wang
Yangning Tang
Shiming He
Changqing Zhao
Pradip Kumar Sharma
Osama Alfarraj
Amr Tolba
author_sort Jin Wang
collection DOAJ
description Log anomaly detection is an efficient method to manage modern large-scale Internet of Things (IoT) systems. More and more works start to apply natural language processing (NLP) methods, and in particular word2vec, in the log feature extraction. Word2vec can extract the relevance between words and vectorize the words. However, the computing cost of training word2vec is high. Anomalies in logs are dependent on not only an individual log message but also on the log message sequence. Therefore, the vector of words from word2vec can not be used directly, which needs to be transformed into the vector of log events and further transformed into the vector of log sequences. To reduce computational cost and avoid multiple transformations, in this paper, we propose an offline feature extraction model, named LogEvent2vec, which takes the log event as input of word2vec to extract the relevance between log events and vectorize log events directly. LogEvent2vec can work with any coordinate transformation methods and anomaly detection models. After getting the log event vector, we transform log event vector to log sequence vector by bary or tf-idf and three kinds of supervised models (Random Forests, Naive Bayes, and Neural Networks) are trained to detect the anomalies. We have conducted extensive experiments on a real public log dataset from BlueGene/L (BGL). The experimental results demonstrate that LogEvent2vec can significantly reduce computational time by 30 times and improve accuracy, comparing with word2vec. LogEvent2vec with bary and Random Forest can achieve the best F1-score and LogEvent2vec with tf-idf and Naive Bayes needs the least computational time.
first_indexed 2024-03-10T20:13:35Z
format Article
id doaj.art-456034c79738462f9e161413bc154746
institution Directory Open Access Journal
issn 1424-8220
language English
last_indexed 2024-03-10T20:13:35Z
publishDate 2020-04-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj.art-456034c79738462f9e161413bc1547462023-11-19T22:44:29ZengMDPI AGSensors1424-82202020-04-01209245110.3390/s20092451LogEvent2vec: LogEvent-to-Vector Based Anomaly Detection for Large-Scale Logs in Internet of ThingsJin Wang0Yangning Tang1Shiming He2Changqing Zhao3Pradip Kumar Sharma4Osama Alfarraj5Amr Tolba6School of Computer and Communication Engineering, Hunan Provincial Key Laboratory of Intelligent Processing of Big Data on Transportation, Changsha University of Science and Technology, Changsha 410114, ChinaSchool of Computer and Communication Engineering, Hunan Provincial Key Laboratory of Intelligent Processing of Big Data on Transportation, Changsha University of Science and Technology, Changsha 410114, ChinaSchool of Computer and Communication Engineering, Hunan Provincial Key Laboratory of Intelligent Processing of Big Data on Transportation, Changsha University of Science and Technology, Changsha 410114, ChinaSchool of Computer and Communication Engineering, Hunan Provincial Key Laboratory of Intelligent Processing of Big Data on Transportation, Changsha University of Science and Technology, Changsha 410114, ChinaDepartment of Computing Science, University of Aberdeen, Aberdeen AB243FX, UKComputer Science Department, Community College, King Saud University, Riyadh 11437, Saudi ArabiaComputer Science Department, Community College, King Saud University, Riyadh 11437, Saudi ArabiaLog anomaly detection is an efficient method to manage modern large-scale Internet of Things (IoT) systems. More and more works start to apply natural language processing (NLP) methods, and in particular word2vec, in the log feature extraction. Word2vec can extract the relevance between words and vectorize the words. However, the computing cost of training word2vec is high. Anomalies in logs are dependent on not only an individual log message but also on the log message sequence. Therefore, the vector of words from word2vec can not be used directly, which needs to be transformed into the vector of log events and further transformed into the vector of log sequences. To reduce computational cost and avoid multiple transformations, in this paper, we propose an offline feature extraction model, named LogEvent2vec, which takes the log event as input of word2vec to extract the relevance between log events and vectorize log events directly. LogEvent2vec can work with any coordinate transformation methods and anomaly detection models. After getting the log event vector, we transform log event vector to log sequence vector by bary or tf-idf and three kinds of supervised models (Random Forests, Naive Bayes, and Neural Networks) are trained to detect the anomalies. We have conducted extensive experiments on a real public log dataset from BlueGene/L (BGL). The experimental results demonstrate that LogEvent2vec can significantly reduce computational time by 30 times and improve accuracy, comparing with word2vec. LogEvent2vec with bary and Random Forest can achieve the best F1-score and LogEvent2vec with tf-idf and Naive Bayes needs the least computational time.https://www.mdpi.com/1424-8220/20/9/2451log anomaly detectionword2veclog eventlog templatedevice managementIoT
spellingShingle Jin Wang
Yangning Tang
Shiming He
Changqing Zhao
Pradip Kumar Sharma
Osama Alfarraj
Amr Tolba
LogEvent2vec: LogEvent-to-Vector Based Anomaly Detection for Large-Scale Logs in Internet of Things
Sensors
log anomaly detection
word2vec
log event
log template
device management
IoT
title LogEvent2vec: LogEvent-to-Vector Based Anomaly Detection for Large-Scale Logs in Internet of Things
title_full LogEvent2vec: LogEvent-to-Vector Based Anomaly Detection for Large-Scale Logs in Internet of Things
title_fullStr LogEvent2vec: LogEvent-to-Vector Based Anomaly Detection for Large-Scale Logs in Internet of Things
title_full_unstemmed LogEvent2vec: LogEvent-to-Vector Based Anomaly Detection for Large-Scale Logs in Internet of Things
title_short LogEvent2vec: LogEvent-to-Vector Based Anomaly Detection for Large-Scale Logs in Internet of Things
title_sort logevent2vec logevent to vector based anomaly detection for large scale logs in internet of things
topic log anomaly detection
word2vec
log event
log template
device management
IoT
url https://www.mdpi.com/1424-8220/20/9/2451
work_keys_str_mv AT jinwang logevent2veclogeventtovectorbasedanomalydetectionforlargescalelogsininternetofthings
AT yangningtang logevent2veclogeventtovectorbasedanomalydetectionforlargescalelogsininternetofthings
AT shiminghe logevent2veclogeventtovectorbasedanomalydetectionforlargescalelogsininternetofthings
AT changqingzhao logevent2veclogeventtovectorbasedanomalydetectionforlargescalelogsininternetofthings
AT pradipkumarsharma logevent2veclogeventtovectorbasedanomalydetectionforlargescalelogsininternetofthings
AT osamaalfarraj logevent2veclogeventtovectorbasedanomalydetectionforlargescalelogsininternetofthings
AT amrtolba logevent2veclogeventtovectorbasedanomalydetectionforlargescalelogsininternetofthings