PatCluster: A Top-Down Log Parsing Method Based on Frequent Words
Logs are a combination of static message type fields and dynamic variable fields, and the accuracy of log parsing affects the result of subsequent log analysis tasks. In this regard, an offline log parsing method based on frequent words is introduced: PatCluster. This method first generates root nod...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2023-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10024775/ |
_version_ | 1797902240840679424 |
---|---|
author | Yu Bai Yongwei Chi Dan Zhao |
author_facet | Yu Bai Yongwei Chi Dan Zhao |
author_sort | Yu Bai |
collection | DOAJ |
description | Logs are a combination of static message type fields and dynamic variable fields, and the accuracy of log parsing affects the result of subsequent log analysis tasks. In this regard, an offline log parsing method based on frequent words is introduced: PatCluster. This method first generates root nodes by preprocessing; secondly, the frequency of words is counted, and the word with the largest frequency is extracted as the segmentation condition to refine the template generated by the root node. So on recursively, pattern nodes are formed for all elements of the nodes, and corresponding templates are generated to finally achieve the purpose of log pattern mining. The mining process of the log patterns is from coarse to fine which is based on fewer assumptions, and the pattern fitting depth can be controlled by adjusting the termination condition. In optimized algorithm model, we also consider the maximum extent of the log template matching the token in the log message. The experimental results show that this method effectively improves the log parsing quality and has higher log parsing accuracy than other methods, and is more suitable for handling logs with complex structures. |
first_indexed | 2024-04-10T09:14:35Z |
format | Article |
id | doaj.art-aa950137de4040b69b02c7ac0576eeb6 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-04-10T09:14:35Z |
publishDate | 2023-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-aa950137de4040b69b02c7ac0576eeb62023-02-21T00:02:58ZengIEEEIEEE Access2169-35362023-01-01118275828210.1109/ACCESS.2023.323901210024775PatCluster: A Top-Down Log Parsing Method Based on Frequent WordsYu Bai0https://orcid.org/0000-0001-5562-5102Yongwei Chi1Dan Zhao2https://orcid.org/0000-0002-1341-7353Advanced Institute of Information Technology, Peking University, Hangzhou, ChinaAdvanced Institute of Information Technology, Peking University, Hangzhou, ChinaAdvanced Institute of Information Technology, Peking University, Hangzhou, ChinaLogs are a combination of static message type fields and dynamic variable fields, and the accuracy of log parsing affects the result of subsequent log analysis tasks. In this regard, an offline log parsing method based on frequent words is introduced: PatCluster. This method first generates root nodes by preprocessing; secondly, the frequency of words is counted, and the word with the largest frequency is extracted as the segmentation condition to refine the template generated by the root node. So on recursively, pattern nodes are formed for all elements of the nodes, and corresponding templates are generated to finally achieve the purpose of log pattern mining. The mining process of the log patterns is from coarse to fine which is based on fewer assumptions, and the pattern fitting depth can be controlled by adjusting the termination condition. In optimized algorithm model, we also consider the maximum extent of the log template matching the token in the log message. The experimental results show that this method effectively improves the log parsing quality and has higher log parsing accuracy than other methods, and is more suitable for handling logs with complex structures.https://ieeexplore.ieee.org/document/10024775/Log parsingoffline algorithmPatClusterfrequent words |
spellingShingle | Yu Bai Yongwei Chi Dan Zhao PatCluster: A Top-Down Log Parsing Method Based on Frequent Words IEEE Access Log parsing offline algorithm PatCluster frequent words |
title | PatCluster: A Top-Down Log Parsing Method Based on Frequent Words |
title_full | PatCluster: A Top-Down Log Parsing Method Based on Frequent Words |
title_fullStr | PatCluster: A Top-Down Log Parsing Method Based on Frequent Words |
title_full_unstemmed | PatCluster: A Top-Down Log Parsing Method Based on Frequent Words |
title_short | PatCluster: A Top-Down Log Parsing Method Based on Frequent Words |
title_sort | patcluster a top down log parsing method based on frequent words |
topic | Log parsing offline algorithm PatCluster frequent words |
url | https://ieeexplore.ieee.org/document/10024775/ |
work_keys_str_mv | AT yubai patclusteratopdownlogparsingmethodbasedonfrequentwords AT yongweichi patclusteratopdownlogparsingmethodbasedonfrequentwords AT danzhao patclusteratopdownlogparsingmethodbasedonfrequentwords |