PrefixSpan Based Pattern Mining Using Time Sliding Weight From Streaming Data

This study proposes the prefixSpan based pattern mining using time sliding weight from streaming data. To discover sequential patterns, it applies a time sliding weight to create a structure of projected DB Tree. For the time sliding weight, a time window is applied to the sequential data to calcula...

Full description

Bibliographic Details
Main Authors: Ji-Soo Kang, Ji-Won Baek, Kyungyong Chung
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9133569/
_version_ 1818444226942205952
author Ji-Soo Kang
Ji-Won Baek
Kyungyong Chung
author_facet Ji-Soo Kang
Ji-Won Baek
Kyungyong Chung
author_sort Ji-Soo Kang
collection DOAJ
description This study proposes the prefixSpan based pattern mining using time sliding weight from streaming data. To discover sequential patterns, it applies a time sliding weight to create a structure of projected DB Tree. For the time sliding weight, a time window is applied to the sequential data to calculate the label and support of the window. When a projected DB Tree is designed, the time weight calculated for each pattern is inserted in a table. At this time, the tree is updated by deleting the node whose time weight is less than the reference value. For this reason, whenever data is updated, the tree is sorted again. The reordering process removes the pattern of less influence by applying time weights. Therefore, it is possible to construct a projected DB Tree that can extract influential patterns. The performance of the proposed method is evaluated in three aspects. Firstly, the conventional PrefixSpan algorithm is compared with the proposed PrefixSpan algorithm based on time sliding weight in terms of the pattern generation time according to size of data. Secondly, the fitness of the proposed algorithm is evaluated through cross-validation. Thirdly, GSP, SPADE, and prefixSpan algorithms are compared with the proposed algorithm through F-measure. As a result of the evaluation, the proposed algorithm improves accuracy 75% more than PrefixSpan algorithm and the sequential pattern algorithms of GSP and SPADE. Regarding the comparison of F-measure based on precision and recall, the proposed one improves its performance about 83%.
first_indexed 2024-12-14T19:12:35Z
format Article
id doaj.art-70e11408e880479aa44a7c6b947b290f
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-14T19:12:35Z
publishDate 2020-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-70e11408e880479aa44a7c6b947b290f2022-12-21T22:50:42ZengIEEEIEEE Access2169-35362020-01-01812483312484410.1109/ACCESS.2020.30074859133569PrefixSpan Based Pattern Mining Using Time Sliding Weight From Streaming DataJi-Soo Kang0https://orcid.org/0000-0003-3433-0094Ji-Won Baek1https://orcid.org/0000-0001-9332-2815Kyungyong Chung2https://orcid.org/0000-0002-6439-9992Department of Computer Science, Kyonggi University, Suwon, South KoreaDepartment of Computer Science, Kyonggi University, Suwon, South KoreaDivision of Computer Science and Engineering, Kyonggi University, Suwon, South KoreaThis study proposes the prefixSpan based pattern mining using time sliding weight from streaming data. To discover sequential patterns, it applies a time sliding weight to create a structure of projected DB Tree. For the time sliding weight, a time window is applied to the sequential data to calculate the label and support of the window. When a projected DB Tree is designed, the time weight calculated for each pattern is inserted in a table. At this time, the tree is updated by deleting the node whose time weight is less than the reference value. For this reason, whenever data is updated, the tree is sorted again. The reordering process removes the pattern of less influence by applying time weights. Therefore, it is possible to construct a projected DB Tree that can extract influential patterns. The performance of the proposed method is evaluated in three aspects. Firstly, the conventional PrefixSpan algorithm is compared with the proposed PrefixSpan algorithm based on time sliding weight in terms of the pattern generation time according to size of data. Secondly, the fitness of the proposed algorithm is evaluated through cross-validation. Thirdly, GSP, SPADE, and prefixSpan algorithms are compared with the proposed algorithm through F-measure. As a result of the evaluation, the proposed algorithm improves accuracy 75% more than PrefixSpan algorithm and the sequential pattern algorithms of GSP and SPADE. Regarding the comparison of F-measure based on precision and recall, the proposed one improves its performance about 83%.https://ieeexplore.ieee.org/document/9133569/Healthcareprefixspansequential analysissequential pattern miningstreaming datatime weight
spellingShingle Ji-Soo Kang
Ji-Won Baek
Kyungyong Chung
PrefixSpan Based Pattern Mining Using Time Sliding Weight From Streaming Data
IEEE Access
Healthcare
prefixspan
sequential analysis
sequential pattern mining
streaming data
time weight
title PrefixSpan Based Pattern Mining Using Time Sliding Weight From Streaming Data
title_full PrefixSpan Based Pattern Mining Using Time Sliding Weight From Streaming Data
title_fullStr PrefixSpan Based Pattern Mining Using Time Sliding Weight From Streaming Data
title_full_unstemmed PrefixSpan Based Pattern Mining Using Time Sliding Weight From Streaming Data
title_short PrefixSpan Based Pattern Mining Using Time Sliding Weight From Streaming Data
title_sort prefixspan based pattern mining using time sliding weight from streaming data
topic Healthcare
prefixspan
sequential analysis
sequential pattern mining
streaming data
time weight
url https://ieeexplore.ieee.org/document/9133569/
work_keys_str_mv AT jisookang prefixspanbasedpatternminingusingtimeslidingweightfromstreamingdata
AT jiwonbaek prefixspanbasedpatternminingusingtimeslidingweightfromstreamingdata
AT kyungyongchung prefixspanbasedpatternminingusingtimeslidingweightfromstreamingdata