An Efficient Method for Mining Top-<italic>K</italic> Closed Sequential Patterns
The problem of exploiting Closed Sequential Patterns (CSPs) is an essential task in data mining, with many different applications. It is used to resolve the situations of huge databases or low minimum support (minsup) thresholds in mining sequential patterns. However, it is challenging and needs a l...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9123855/ |
_version_ | 1819276647829864448 |
---|---|
author | Thi-Thiet Pham Tung Do Anh Nguyen Bay Vo Tzung-Pei Hong |
author_facet | Thi-Thiet Pham Tung Do Anh Nguyen Bay Vo Tzung-Pei Hong |
author_sort | Thi-Thiet Pham |
collection | DOAJ |
description | The problem of exploiting Closed Sequential Patterns (CSPs) is an essential task in data mining, with many different applications. It is used to resolve the situations of huge databases or low minimum support (minsup) thresholds in mining sequential patterns. However, it is challenging and needs a lot of time to customize the minsup values for generating appropriate numbers of CSPs desired by users. To conquer this issue, the TSP algorithm for mining top-k CSPs was previously proposed, with k being a given parameter. The algorithm would return the k CSPs which have the highest support values in a database. However, its execution time and memory usage were high. In this paper, an algorithm named TKCS (Top-K Closed Sequences) is proposed to mine the top-k CSPs efficiently. To improve the execution time and memory usage, it uses a vertical bitmap database to represent data. Besides, it adopts some useful strategies in the process of exploiting the top-k CSPs such as: always choosing the sequential patterns with the greatest support values for generating candidate patterns and storing top-k CSPs in an ascending order of the support values to increase the minsup value more quickly. The empirical results show that TKCS has better performance than TSP for discovering the top-k CSPs in terms of both runtime and memory usage. |
first_indexed | 2024-12-23T23:43:33Z |
format | Article |
id | doaj.art-e189b0d2f1b24eb6a82261f31f3359ef |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-23T23:43:33Z |
publishDate | 2020-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-e189b0d2f1b24eb6a82261f31f3359ef2022-12-21T17:25:35ZengIEEEIEEE Access2169-35362020-01-01811815611816310.1109/ACCESS.2020.30045289123855An Efficient Method for Mining Top-<italic>K</italic> Closed Sequential PatternsThi-Thiet Pham0Tung Do1Anh Nguyen2Bay Vo3https://orcid.org/0000-0002-9246-4587Tzung-Pei Hong4https://orcid.org/0000-0001-7305-6492Faculty of Information Technology, Industrial University of Ho Chi Minh City, Ho Chi Minh City, VietnamFaculty of Basic Science, Van Lang University, Ho Chi Minh City, VietnamInstitute of Research and Development, Duy Tan University, Da Nang, VietnamFaculty of Information Technology, Ho Chi Minh City University of Technology (HUTECH), Ho Chi Minh City, VietnamDepartment of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung, TaiwanThe problem of exploiting Closed Sequential Patterns (CSPs) is an essential task in data mining, with many different applications. It is used to resolve the situations of huge databases or low minimum support (minsup) thresholds in mining sequential patterns. However, it is challenging and needs a lot of time to customize the minsup values for generating appropriate numbers of CSPs desired by users. To conquer this issue, the TSP algorithm for mining top-k CSPs was previously proposed, with k being a given parameter. The algorithm would return the k CSPs which have the highest support values in a database. However, its execution time and memory usage were high. In this paper, an algorithm named TKCS (Top-K Closed Sequences) is proposed to mine the top-k CSPs efficiently. To improve the execution time and memory usage, it uses a vertical bitmap database to represent data. Besides, it adopts some useful strategies in the process of exploiting the top-k CSPs such as: always choosing the sequential patterns with the greatest support values for generating candidate patterns and storing top-k CSPs in an ascending order of the support values to increase the minsup value more quickly. The empirical results show that TKCS has better performance than TSP for discovering the top-k CSPs in terms of both runtime and memory usage.https://ieeexplore.ieee.org/document/9123855/Closed sequential patterndata miningsequential patterntop-k sequential patterns |
spellingShingle | Thi-Thiet Pham Tung Do Anh Nguyen Bay Vo Tzung-Pei Hong An Efficient Method for Mining Top-<italic>K</italic> Closed Sequential Patterns IEEE Access Closed sequential pattern data mining sequential pattern top-k sequential patterns |
title | An Efficient Method for Mining Top-<italic>K</italic> Closed Sequential Patterns |
title_full | An Efficient Method for Mining Top-<italic>K</italic> Closed Sequential Patterns |
title_fullStr | An Efficient Method for Mining Top-<italic>K</italic> Closed Sequential Patterns |
title_full_unstemmed | An Efficient Method for Mining Top-<italic>K</italic> Closed Sequential Patterns |
title_short | An Efficient Method for Mining Top-<italic>K</italic> Closed Sequential Patterns |
title_sort | efficient method for mining top italic k italic closed sequential patterns |
topic | Closed sequential pattern data mining sequential pattern top-k sequential patterns |
url | https://ieeexplore.ieee.org/document/9123855/ |
work_keys_str_mv | AT thithietpham anefficientmethodforminingtopitalickitalicclosedsequentialpatterns AT tungdo anefficientmethodforminingtopitalickitalicclosedsequentialpatterns AT anhnguyen anefficientmethodforminingtopitalickitalicclosedsequentialpatterns AT bayvo anefficientmethodforminingtopitalickitalicclosedsequentialpatterns AT tzungpeihong anefficientmethodforminingtopitalickitalicclosedsequentialpatterns AT thithietpham efficientmethodforminingtopitalickitalicclosedsequentialpatterns AT tungdo efficientmethodforminingtopitalickitalicclosedsequentialpatterns AT anhnguyen efficientmethodforminingtopitalickitalicclosedsequentialpatterns AT bayvo efficientmethodforminingtopitalickitalicclosedsequentialpatterns AT tzungpeihong efficientmethodforminingtopitalickitalicclosedsequentialpatterns |