PrefixSpan Based Pattern Mining Using Time Sliding Weight From Streaming Data

This study proposes the prefixSpan based pattern mining using time sliding weight from streaming data. To discover sequential patterns, it applies a time sliding weight to create a structure of projected DB Tree. For the time sliding weight, a time window is applied to the sequential data to calcula...

Full description

Bibliographic Details
Main Authors:	Ji-Soo Kang, Ji-Won Baek, Kyungyong Chung
Format:	Article
Language:	English
Published:	IEEE 2020-01-01
Series:	IEEE Access
Subjects:	Healthcare prefixspan sequential analysis sequential pattern mining streaming data time weight
Online Access:	https://ieeexplore.ieee.org/document/9133569/

_version_	1818444226942205952
author	Ji-Soo Kang Ji-Won Baek Kyungyong Chung
author_facet	Ji-Soo Kang Ji-Won Baek Kyungyong Chung
author_sort	Ji-Soo Kang
collection	DOAJ
description	This study proposes the prefixSpan based pattern mining using time sliding weight from streaming data. To discover sequential patterns, it applies a time sliding weight to create a structure of projected DB Tree. For the time sliding weight, a time window is applied to the sequential data to calculate the label and support of the window. When a projected DB Tree is designed, the time weight calculated for each pattern is inserted in a table. At this time, the tree is updated by deleting the node whose time weight is less than the reference value. For this reason, whenever data is updated, the tree is sorted again. The reordering process removes the pattern of less influence by applying time weights. Therefore, it is possible to construct a projected DB Tree that can extract influential patterns. The performance of the proposed method is evaluated in three aspects. Firstly, the conventional PrefixSpan algorithm is compared with the proposed PrefixSpan algorithm based on time sliding weight in terms of the pattern generation time according to size of data. Secondly, the fitness of the proposed algorithm is evaluated through cross-validation. Thirdly, GSP, SPADE, and prefixSpan algorithms are compared with the proposed algorithm through F-measure. As a result of the evaluation, the proposed algorithm improves accuracy 75% more than PrefixSpan algorithm and the sequential pattern algorithms of GSP and SPADE. Regarding the comparison of F-measure based on precision and recall, the proposed one improves its performance about 83%.
first_indexed	2024-12-14T19:12:35Z
format	Article
id	doaj.art-70e11408e880479aa44a7c6b947b290f
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-12-14T19:12:35Z
publishDate	2020-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-70e11408e880479aa44a7c6b947b290f2022-12-21T22:50:42ZengIEEEIEEE Access2169-35362020-01-01812483312484410.1109/ACCESS.2020.30074859133569PrefixSpan Based Pattern Mining Using Time Sliding Weight From Streaming DataJi-Soo Kang0https://orcid.org/0000-0003-3433-0094Ji-Won Baek1https://orcid.org/0000-0001-9332-2815Kyungyong Chung2https://orcid.org/0000-0002-6439-9992Department of Computer Science, Kyonggi University, Suwon, South KoreaDepartment of Computer Science, Kyonggi University, Suwon, South KoreaDivision of Computer Science and Engineering, Kyonggi University, Suwon, South KoreaThis study proposes the prefixSpan based pattern mining using time sliding weight from streaming data. To discover sequential patterns, it applies a time sliding weight to create a structure of projected DB Tree. For the time sliding weight, a time window is applied to the sequential data to calculate the label and support of the window. When a projected DB Tree is designed, the time weight calculated for each pattern is inserted in a table. At this time, the tree is updated by deleting the node whose time weight is less than the reference value. For this reason, whenever data is updated, the tree is sorted again. The reordering process removes the pattern of less influence by applying time weights. Therefore, it is possible to construct a projected DB Tree that can extract influential patterns. The performance of the proposed method is evaluated in three aspects. Firstly, the conventional PrefixSpan algorithm is compared with the proposed PrefixSpan algorithm based on time sliding weight in terms of the pattern generation time according to size of data. Secondly, the fitness of the proposed algorithm is evaluated through cross-validation. Thirdly, GSP, SPADE, and prefixSpan algorithms are compared with the proposed algorithm through F-measure. As a result of the evaluation, the proposed algorithm improves accuracy 75% more than PrefixSpan algorithm and the sequential pattern algorithms of GSP and SPADE. Regarding the comparison of F-measure based on precision and recall, the proposed one improves its performance about 83%.https://ieeexplore.ieee.org/document/9133569/Healthcareprefixspansequential analysissequential pattern miningstreaming datatime weight
spellingShingle	Ji-Soo Kang Ji-Won Baek Kyungyong Chung PrefixSpan Based Pattern Mining Using Time Sliding Weight From Streaming Data IEEE Access Healthcare prefixspan sequential analysis sequential pattern mining streaming data time weight
title	PrefixSpan Based Pattern Mining Using Time Sliding Weight From Streaming Data
title_full	PrefixSpan Based Pattern Mining Using Time Sliding Weight From Streaming Data
title_fullStr	PrefixSpan Based Pattern Mining Using Time Sliding Weight From Streaming Data
title_full_unstemmed	PrefixSpan Based Pattern Mining Using Time Sliding Weight From Streaming Data
title_short	PrefixSpan Based Pattern Mining Using Time Sliding Weight From Streaming Data
title_sort	prefixspan based pattern mining using time sliding weight from streaming data
topic	Healthcare prefixspan sequential analysis sequential pattern mining streaming data time weight
url	https://ieeexplore.ieee.org/document/9133569/
work_keys_str_mv	AT jisookang prefixspanbasedpatternminingusingtimeslidingweightfromstreamingdata AT jiwonbaek prefixspanbasedpatternminingusingtimeslidingweightfromstreamingdata AT kyungyongchung prefixspanbasedpatternminingusingtimeslidingweightfromstreamingdata

PrefixSpan Based Pattern Mining Using Time Sliding Weight From Streaming Data

Similar Items