An Efficient Method for Mining Top-<italic>K</italic> Closed Sequential Patterns

The problem of exploiting Closed Sequential Patterns (CSPs) is an essential task in data mining, with many different applications. It is used to resolve the situations of huge databases or low minimum support (minsup) thresholds in mining sequential patterns. However, it is challenging and needs a l...

Full description

Bibliographic Details
Main Authors: Thi-Thiet Pham, Tung Do, Anh Nguyen, Bay Vo, Tzung-Pei Hong
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9123855/
_version_ 1819276647829864448
author Thi-Thiet Pham
Tung Do
Anh Nguyen
Bay Vo
Tzung-Pei Hong
author_facet Thi-Thiet Pham
Tung Do
Anh Nguyen
Bay Vo
Tzung-Pei Hong
author_sort Thi-Thiet Pham
collection DOAJ
description The problem of exploiting Closed Sequential Patterns (CSPs) is an essential task in data mining, with many different applications. It is used to resolve the situations of huge databases or low minimum support (minsup) thresholds in mining sequential patterns. However, it is challenging and needs a lot of time to customize the minsup values for generating appropriate numbers of CSPs desired by users. To conquer this issue, the TSP algorithm for mining top-k CSPs was previously proposed, with k being a given parameter. The algorithm would return the k CSPs which have the highest support values in a database. However, its execution time and memory usage were high. In this paper, an algorithm named TKCS (Top-K Closed Sequences) is proposed to mine the top-k CSPs efficiently. To improve the execution time and memory usage, it uses a vertical bitmap database to represent data. Besides, it adopts some useful strategies in the process of exploiting the top-k CSPs such as: always choosing the sequential patterns with the greatest support values for generating candidate patterns and storing top-k CSPs in an ascending order of the support values to increase the minsup value more quickly. The empirical results show that TKCS has better performance than TSP for discovering the top-k CSPs in terms of both runtime and memory usage.
first_indexed 2024-12-23T23:43:33Z
format Article
id doaj.art-e189b0d2f1b24eb6a82261f31f3359ef
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-23T23:43:33Z
publishDate 2020-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-e189b0d2f1b24eb6a82261f31f3359ef2022-12-21T17:25:35ZengIEEEIEEE Access2169-35362020-01-01811815611816310.1109/ACCESS.2020.30045289123855An Efficient Method for Mining Top-<italic>K</italic> Closed Sequential PatternsThi-Thiet Pham0Tung Do1Anh Nguyen2Bay Vo3https://orcid.org/0000-0002-9246-4587Tzung-Pei Hong4https://orcid.org/0000-0001-7305-6492Faculty of Information Technology, Industrial University of Ho Chi Minh City, Ho Chi Minh City, VietnamFaculty of Basic Science, Van Lang University, Ho Chi Minh City, VietnamInstitute of Research and Development, Duy Tan University, Da Nang, VietnamFaculty of Information Technology, Ho Chi Minh City University of Technology (HUTECH), Ho Chi Minh City, VietnamDepartment of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung, TaiwanThe problem of exploiting Closed Sequential Patterns (CSPs) is an essential task in data mining, with many different applications. It is used to resolve the situations of huge databases or low minimum support (minsup) thresholds in mining sequential patterns. However, it is challenging and needs a lot of time to customize the minsup values for generating appropriate numbers of CSPs desired by users. To conquer this issue, the TSP algorithm for mining top-k CSPs was previously proposed, with k being a given parameter. The algorithm would return the k CSPs which have the highest support values in a database. However, its execution time and memory usage were high. In this paper, an algorithm named TKCS (Top-K Closed Sequences) is proposed to mine the top-k CSPs efficiently. To improve the execution time and memory usage, it uses a vertical bitmap database to represent data. Besides, it adopts some useful strategies in the process of exploiting the top-k CSPs such as: always choosing the sequential patterns with the greatest support values for generating candidate patterns and storing top-k CSPs in an ascending order of the support values to increase the minsup value more quickly. The empirical results show that TKCS has better performance than TSP for discovering the top-k CSPs in terms of both runtime and memory usage.https://ieeexplore.ieee.org/document/9123855/Closed sequential patterndata miningsequential patterntop-k sequential patterns
spellingShingle Thi-Thiet Pham
Tung Do
Anh Nguyen
Bay Vo
Tzung-Pei Hong
An Efficient Method for Mining Top-<italic>K</italic> Closed Sequential Patterns
IEEE Access
Closed sequential pattern
data mining
sequential pattern
top-k sequential patterns
title An Efficient Method for Mining Top-<italic>K</italic> Closed Sequential Patterns
title_full An Efficient Method for Mining Top-<italic>K</italic> Closed Sequential Patterns
title_fullStr An Efficient Method for Mining Top-<italic>K</italic> Closed Sequential Patterns
title_full_unstemmed An Efficient Method for Mining Top-<italic>K</italic> Closed Sequential Patterns
title_short An Efficient Method for Mining Top-<italic>K</italic> Closed Sequential Patterns
title_sort efficient method for mining top italic k italic closed sequential patterns
topic Closed sequential pattern
data mining
sequential pattern
top-k sequential patterns
url https://ieeexplore.ieee.org/document/9123855/
work_keys_str_mv AT thithietpham anefficientmethodforminingtopitalickitalicclosedsequentialpatterns
AT tungdo anefficientmethodforminingtopitalickitalicclosedsequentialpatterns
AT anhnguyen anefficientmethodforminingtopitalickitalicclosedsequentialpatterns
AT bayvo anefficientmethodforminingtopitalickitalicclosedsequentialpatterns
AT tzungpeihong anefficientmethodforminingtopitalickitalicclosedsequentialpatterns
AT thithietpham efficientmethodforminingtopitalickitalicclosedsequentialpatterns
AT tungdo efficientmethodforminingtopitalickitalicclosedsequentialpatterns
AT anhnguyen efficientmethodforminingtopitalickitalicclosedsequentialpatterns
AT bayvo efficientmethodforminingtopitalickitalicclosedsequentialpatterns
AT tzungpeihong efficientmethodforminingtopitalickitalicclosedsequentialpatterns