PatCluster: A Top-Down Log Parsing Method Based on Frequent Words

Logs are a combination of static message type fields and dynamic variable fields, and the accuracy of log parsing affects the result of subsequent log analysis tasks. In this regard, an offline log parsing method based on frequent words is introduced: PatCluster. This method first generates root nod...

Full description

Bibliographic Details
Main Authors: Yu Bai, Yongwei Chi, Dan Zhao
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10024775/
_version_ 1797902240840679424
author Yu Bai
Yongwei Chi
Dan Zhao
author_facet Yu Bai
Yongwei Chi
Dan Zhao
author_sort Yu Bai
collection DOAJ
description Logs are a combination of static message type fields and dynamic variable fields, and the accuracy of log parsing affects the result of subsequent log analysis tasks. In this regard, an offline log parsing method based on frequent words is introduced: PatCluster. This method first generates root nodes by preprocessing; secondly, the frequency of words is counted, and the word with the largest frequency is extracted as the segmentation condition to refine the template generated by the root node. So on recursively, pattern nodes are formed for all elements of the nodes, and corresponding templates are generated to finally achieve the purpose of log pattern mining. The mining process of the log patterns is from coarse to fine which is based on fewer assumptions, and the pattern fitting depth can be controlled by adjusting the termination condition. In optimized algorithm model, we also consider the maximum extent of the log template matching the token in the log message. The experimental results show that this method effectively improves the log parsing quality and has higher log parsing accuracy than other methods, and is more suitable for handling logs with complex structures.
first_indexed 2024-04-10T09:14:35Z
format Article
id doaj.art-aa950137de4040b69b02c7ac0576eeb6
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-10T09:14:35Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-aa950137de4040b69b02c7ac0576eeb62023-02-21T00:02:58ZengIEEEIEEE Access2169-35362023-01-01118275828210.1109/ACCESS.2023.323901210024775PatCluster: A Top-Down Log Parsing Method Based on Frequent WordsYu Bai0https://orcid.org/0000-0001-5562-5102Yongwei Chi1Dan Zhao2https://orcid.org/0000-0002-1341-7353Advanced Institute of Information Technology, Peking University, Hangzhou, ChinaAdvanced Institute of Information Technology, Peking University, Hangzhou, ChinaAdvanced Institute of Information Technology, Peking University, Hangzhou, ChinaLogs are a combination of static message type fields and dynamic variable fields, and the accuracy of log parsing affects the result of subsequent log analysis tasks. In this regard, an offline log parsing method based on frequent words is introduced: PatCluster. This method first generates root nodes by preprocessing; secondly, the frequency of words is counted, and the word with the largest frequency is extracted as the segmentation condition to refine the template generated by the root node. So on recursively, pattern nodes are formed for all elements of the nodes, and corresponding templates are generated to finally achieve the purpose of log pattern mining. The mining process of the log patterns is from coarse to fine which is based on fewer assumptions, and the pattern fitting depth can be controlled by adjusting the termination condition. In optimized algorithm model, we also consider the maximum extent of the log template matching the token in the log message. The experimental results show that this method effectively improves the log parsing quality and has higher log parsing accuracy than other methods, and is more suitable for handling logs with complex structures.https://ieeexplore.ieee.org/document/10024775/Log parsingoffline algorithmPatClusterfrequent words
spellingShingle Yu Bai
Yongwei Chi
Dan Zhao
PatCluster: A Top-Down Log Parsing Method Based on Frequent Words
IEEE Access
Log parsing
offline algorithm
PatCluster
frequent words
title PatCluster: A Top-Down Log Parsing Method Based on Frequent Words
title_full PatCluster: A Top-Down Log Parsing Method Based on Frequent Words
title_fullStr PatCluster: A Top-Down Log Parsing Method Based on Frequent Words
title_full_unstemmed PatCluster: A Top-Down Log Parsing Method Based on Frequent Words
title_short PatCluster: A Top-Down Log Parsing Method Based on Frequent Words
title_sort patcluster a top down log parsing method based on frequent words
topic Log parsing
offline algorithm
PatCluster
frequent words
url https://ieeexplore.ieee.org/document/10024775/
work_keys_str_mv AT yubai patclusteratopdownlogparsingmethodbasedonfrequentwords
AT yongweichi patclusteratopdownlogparsingmethodbasedonfrequentwords
AT danzhao patclusteratopdownlogparsingmethodbasedonfrequentwords