SDFP-Growth Algorithm as a Novelty of Association Rule Mining Optimization
An essential element of association rules is the strong confidence values that depend on the support value threshold, which determines the optimum number of datasets. The existing method for determining the support value threshold is carried out manually by trial and error; the user determines a sup...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2024-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10418933/ |
_version_ | 1797311643817869312 |
---|---|
author | Boby Siswanto Haryono Soeparno Nesti Fronika Sianipar Widodo Budiharto |
author_facet | Boby Siswanto Haryono Soeparno Nesti Fronika Sianipar Widodo Budiharto |
author_sort | Boby Siswanto |
collection | DOAJ |
description | An essential element of association rules is the strong confidence values that depend on the support value threshold, which determines the optimum number of datasets. The existing method for determining the support value threshold is carried out manually by trial and error; the user determines a support value such as 10%, 30%, or 60% according to their instincts. If the support value threshold is inappropriate, it produces useless frequent patterns, overburdens computer resources, and wastes time. The formula for predicting the maximum count of frequent patterns was 2n – 1, where <inline-formula> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula> is the number of distinct items in the dataset. This paper proposes a new SDFP-growth algorithm that does not require manual determination of the support threshold value. The SDFP-growth algorithm will perform dimensionality reduction on the original dataset that will generate level 1 and level 2 smaller datasets, thus automatically producing a dataset with an optimum amount of data with a minimum support value threshold. The proposed formula for predicting the maximum number of frequent patterns will become 2<inline-formula> <tex-math notation="LaTeX">$^{\vert A\vert }$ </tex-math></inline-formula> - 1, which is <inline-formula> <tex-math notation="LaTeX">$\vert A \vert $ </tex-math></inline-formula> will always be smaller than <inline-formula> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula>. Experiments were performed on five various datasets, which reduced the number of data dimensions by more than 3% on the Level 1 dataset and more than 69% on the Level 2 dataset by maintaining the confidence value of the strong rules. In the execution time evaluated, we found an optimization of more than 2% on the level 1 dataset and more than 94% on the level 2 dataset. |
first_indexed | 2024-03-08T02:03:38Z |
format | Article |
id | doaj.art-4eb404b4dd974ff595bb801da9644b95 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-03-08T02:03:38Z |
publishDate | 2024-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-4eb404b4dd974ff595bb801da9644b952024-02-14T00:01:51ZengIEEEIEEE Access2169-35362024-01-0112214912150210.1109/ACCESS.2024.336166710418933SDFP-Growth Algorithm as a Novelty of Association Rule Mining OptimizationBoby Siswanto0https://orcid.org/0000-0002-1754-3867Haryono Soeparno1Nesti Fronika Sianipar2Widodo Budiharto3https://orcid.org/0000-0003-2681-0901Computer Science Department, BINUS Graduate Program-Doctor of Computer Science, Bina Nusantara University, South Jakarta, IndonesiaComputer Science Department, BINUS Graduate Program-Doctor of Computer Science, Bina Nusantara University, South Jakarta, IndonesiaBiotechnology Department, Faculty of Engineering, Bina Nusantara University, South Jakarta, IndonesiaComputer Science Department, School of Computer Science, Bina Nusantara University, South Jakarta, IndonesiaAn essential element of association rules is the strong confidence values that depend on the support value threshold, which determines the optimum number of datasets. The existing method for determining the support value threshold is carried out manually by trial and error; the user determines a support value such as 10%, 30%, or 60% according to their instincts. If the support value threshold is inappropriate, it produces useless frequent patterns, overburdens computer resources, and wastes time. The formula for predicting the maximum count of frequent patterns was 2n – 1, where <inline-formula> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula> is the number of distinct items in the dataset. This paper proposes a new SDFP-growth algorithm that does not require manual determination of the support threshold value. The SDFP-growth algorithm will perform dimensionality reduction on the original dataset that will generate level 1 and level 2 smaller datasets, thus automatically producing a dataset with an optimum amount of data with a minimum support value threshold. The proposed formula for predicting the maximum number of frequent patterns will become 2<inline-formula> <tex-math notation="LaTeX">$^{\vert A\vert }$ </tex-math></inline-formula> - 1, which is <inline-formula> <tex-math notation="LaTeX">$\vert A \vert $ </tex-math></inline-formula> will always be smaller than <inline-formula> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula>. Experiments were performed on five various datasets, which reduced the number of data dimensions by more than 3% on the Level 1 dataset and more than 69% on the Level 2 dataset by maintaining the confidence value of the strong rules. In the execution time evaluated, we found an optimization of more than 2% on the level 1 dataset and more than 94% on the level 2 dataset.https://ieeexplore.ieee.org/document/10418933/Association rule miningSDFP-growth algorithmdimensionality reductionoptimizationFP-tree pruning |
spellingShingle | Boby Siswanto Haryono Soeparno Nesti Fronika Sianipar Widodo Budiharto SDFP-Growth Algorithm as a Novelty of Association Rule Mining Optimization IEEE Access Association rule mining SDFP-growth algorithm dimensionality reduction optimization FP-tree pruning |
title | SDFP-Growth Algorithm as a Novelty of Association Rule Mining Optimization |
title_full | SDFP-Growth Algorithm as a Novelty of Association Rule Mining Optimization |
title_fullStr | SDFP-Growth Algorithm as a Novelty of Association Rule Mining Optimization |
title_full_unstemmed | SDFP-Growth Algorithm as a Novelty of Association Rule Mining Optimization |
title_short | SDFP-Growth Algorithm as a Novelty of Association Rule Mining Optimization |
title_sort | sdfp growth algorithm as a novelty of association rule mining optimization |
topic | Association rule mining SDFP-growth algorithm dimensionality reduction optimization FP-tree pruning |
url | https://ieeexplore.ieee.org/document/10418933/ |
work_keys_str_mv | AT bobysiswanto sdfpgrowthalgorithmasanoveltyofassociationruleminingoptimization AT haryonosoeparno sdfpgrowthalgorithmasanoveltyofassociationruleminingoptimization AT nestifronikasianipar sdfpgrowthalgorithmasanoveltyofassociationruleminingoptimization AT widodobudiharto sdfpgrowthalgorithmasanoveltyofassociationruleminingoptimization |