PrefixSpan Based Pattern Mining Using Time Sliding Weight From Streaming Data
This study proposes the prefixSpan based pattern mining using time sliding weight from streaming data. To discover sequential patterns, it applies a time sliding weight to create a structure of projected DB Tree. For the time sliding weight, a time window is applied to the sequential data to calcula...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9133569/ |
_version_ | 1818444226942205952 |
---|---|
author | Ji-Soo Kang Ji-Won Baek Kyungyong Chung |
author_facet | Ji-Soo Kang Ji-Won Baek Kyungyong Chung |
author_sort | Ji-Soo Kang |
collection | DOAJ |
description | This study proposes the prefixSpan based pattern mining using time sliding weight from streaming data. To discover sequential patterns, it applies a time sliding weight to create a structure of projected DB Tree. For the time sliding weight, a time window is applied to the sequential data to calculate the label and support of the window. When a projected DB Tree is designed, the time weight calculated for each pattern is inserted in a table. At this time, the tree is updated by deleting the node whose time weight is less than the reference value. For this reason, whenever data is updated, the tree is sorted again. The reordering process removes the pattern of less influence by applying time weights. Therefore, it is possible to construct a projected DB Tree that can extract influential patterns. The performance of the proposed method is evaluated in three aspects. Firstly, the conventional PrefixSpan algorithm is compared with the proposed PrefixSpan algorithm based on time sliding weight in terms of the pattern generation time according to size of data. Secondly, the fitness of the proposed algorithm is evaluated through cross-validation. Thirdly, GSP, SPADE, and prefixSpan algorithms are compared with the proposed algorithm through F-measure. As a result of the evaluation, the proposed algorithm improves accuracy 75% more than PrefixSpan algorithm and the sequential pattern algorithms of GSP and SPADE. Regarding the comparison of F-measure based on precision and recall, the proposed one improves its performance about 83%. |
first_indexed | 2024-12-14T19:12:35Z |
format | Article |
id | doaj.art-70e11408e880479aa44a7c6b947b290f |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-14T19:12:35Z |
publishDate | 2020-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-70e11408e880479aa44a7c6b947b290f2022-12-21T22:50:42ZengIEEEIEEE Access2169-35362020-01-01812483312484410.1109/ACCESS.2020.30074859133569PrefixSpan Based Pattern Mining Using Time Sliding Weight From Streaming DataJi-Soo Kang0https://orcid.org/0000-0003-3433-0094Ji-Won Baek1https://orcid.org/0000-0001-9332-2815Kyungyong Chung2https://orcid.org/0000-0002-6439-9992Department of Computer Science, Kyonggi University, Suwon, South KoreaDepartment of Computer Science, Kyonggi University, Suwon, South KoreaDivision of Computer Science and Engineering, Kyonggi University, Suwon, South KoreaThis study proposes the prefixSpan based pattern mining using time sliding weight from streaming data. To discover sequential patterns, it applies a time sliding weight to create a structure of projected DB Tree. For the time sliding weight, a time window is applied to the sequential data to calculate the label and support of the window. When a projected DB Tree is designed, the time weight calculated for each pattern is inserted in a table. At this time, the tree is updated by deleting the node whose time weight is less than the reference value. For this reason, whenever data is updated, the tree is sorted again. The reordering process removes the pattern of less influence by applying time weights. Therefore, it is possible to construct a projected DB Tree that can extract influential patterns. The performance of the proposed method is evaluated in three aspects. Firstly, the conventional PrefixSpan algorithm is compared with the proposed PrefixSpan algorithm based on time sliding weight in terms of the pattern generation time according to size of data. Secondly, the fitness of the proposed algorithm is evaluated through cross-validation. Thirdly, GSP, SPADE, and prefixSpan algorithms are compared with the proposed algorithm through F-measure. As a result of the evaluation, the proposed algorithm improves accuracy 75% more than PrefixSpan algorithm and the sequential pattern algorithms of GSP and SPADE. Regarding the comparison of F-measure based on precision and recall, the proposed one improves its performance about 83%.https://ieeexplore.ieee.org/document/9133569/Healthcareprefixspansequential analysissequential pattern miningstreaming datatime weight |
spellingShingle | Ji-Soo Kang Ji-Won Baek Kyungyong Chung PrefixSpan Based Pattern Mining Using Time Sliding Weight From Streaming Data IEEE Access Healthcare prefixspan sequential analysis sequential pattern mining streaming data time weight |
title | PrefixSpan Based Pattern Mining Using Time Sliding Weight From Streaming Data |
title_full | PrefixSpan Based Pattern Mining Using Time Sliding Weight From Streaming Data |
title_fullStr | PrefixSpan Based Pattern Mining Using Time Sliding Weight From Streaming Data |
title_full_unstemmed | PrefixSpan Based Pattern Mining Using Time Sliding Weight From Streaming Data |
title_short | PrefixSpan Based Pattern Mining Using Time Sliding Weight From Streaming Data |
title_sort | prefixspan based pattern mining using time sliding weight from streaming data |
topic | Healthcare prefixspan sequential analysis sequential pattern mining streaming data time weight |
url | https://ieeexplore.ieee.org/document/9133569/ |
work_keys_str_mv | AT jisookang prefixspanbasedpatternminingusingtimeslidingweightfromstreamingdata AT jiwonbaek prefixspanbasedpatternminingusingtimeslidingweightfromstreamingdata AT kyungyongchung prefixspanbasedpatternminingusingtimeslidingweightfromstreamingdata |