SDFP-Growth Algorithm as a Novelty of Association Rule Mining Optimization

An essential element of association rules is the strong confidence values that depend on the support value threshold, which determines the optimum number of datasets. The existing method for determining the support value threshold is carried out manually by trial and error; the user determines a sup...

Full description

Bibliographic Details
Main Authors: Boby Siswanto, Haryono Soeparno, Nesti Fronika Sianipar, Widodo Budiharto
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10418933/
_version_ 1797311643817869312
author Boby Siswanto
Haryono Soeparno
Nesti Fronika Sianipar
Widodo Budiharto
author_facet Boby Siswanto
Haryono Soeparno
Nesti Fronika Sianipar
Widodo Budiharto
author_sort Boby Siswanto
collection DOAJ
description An essential element of association rules is the strong confidence values that depend on the support value threshold, which determines the optimum number of datasets. The existing method for determining the support value threshold is carried out manually by trial and error; the user determines a support value such as 10&#x0025;, 30&#x0025;, or 60&#x0025; according to their instincts. If the support value threshold is inappropriate, it produces useless frequent patterns, overburdens computer resources, and wastes time. The formula for predicting the maximum count of frequent patterns was 2n &#x2013; 1, where <inline-formula> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula> is the number of distinct items in the dataset. This paper proposes a new SDFP-growth algorithm that does not require manual determination of the support threshold value. The SDFP-growth algorithm will perform dimensionality reduction on the original dataset that will generate level 1 and level 2 smaller datasets, thus automatically producing a dataset with an optimum amount of data with a minimum support value threshold. The proposed formula for predicting the maximum number of frequent patterns will become 2<inline-formula> <tex-math notation="LaTeX">$^{\vert A\vert }$ </tex-math></inline-formula> - 1, which is <inline-formula> <tex-math notation="LaTeX">$\vert A \vert $ </tex-math></inline-formula> will always be smaller than <inline-formula> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula>. Experiments were performed on five various datasets, which reduced the number of data dimensions by more than 3&#x0025; on the Level 1 dataset and more than 69&#x0025; on the Level 2 dataset by maintaining the confidence value of the strong rules. In the execution time evaluated, we found an optimization of more than 2&#x0025; on the level 1 dataset and more than 94&#x0025; on the level 2 dataset.
first_indexed 2024-03-08T02:03:38Z
format Article
id doaj.art-4eb404b4dd974ff595bb801da9644b95
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-03-08T02:03:38Z
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-4eb404b4dd974ff595bb801da9644b952024-02-14T00:01:51ZengIEEEIEEE Access2169-35362024-01-0112214912150210.1109/ACCESS.2024.336166710418933SDFP-Growth Algorithm as a Novelty of Association Rule Mining OptimizationBoby Siswanto0https://orcid.org/0000-0002-1754-3867Haryono Soeparno1Nesti Fronika Sianipar2Widodo Budiharto3https://orcid.org/0000-0003-2681-0901Computer Science Department, BINUS Graduate Program-Doctor of Computer Science, Bina Nusantara University, South Jakarta, IndonesiaComputer Science Department, BINUS Graduate Program-Doctor of Computer Science, Bina Nusantara University, South Jakarta, IndonesiaBiotechnology Department, Faculty of Engineering, Bina Nusantara University, South Jakarta, IndonesiaComputer Science Department, School of Computer Science, Bina Nusantara University, South Jakarta, IndonesiaAn essential element of association rules is the strong confidence values that depend on the support value threshold, which determines the optimum number of datasets. The existing method for determining the support value threshold is carried out manually by trial and error; the user determines a support value such as 10&#x0025;, 30&#x0025;, or 60&#x0025; according to their instincts. If the support value threshold is inappropriate, it produces useless frequent patterns, overburdens computer resources, and wastes time. The formula for predicting the maximum count of frequent patterns was 2n &#x2013; 1, where <inline-formula> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula> is the number of distinct items in the dataset. This paper proposes a new SDFP-growth algorithm that does not require manual determination of the support threshold value. The SDFP-growth algorithm will perform dimensionality reduction on the original dataset that will generate level 1 and level 2 smaller datasets, thus automatically producing a dataset with an optimum amount of data with a minimum support value threshold. The proposed formula for predicting the maximum number of frequent patterns will become 2<inline-formula> <tex-math notation="LaTeX">$^{\vert A\vert }$ </tex-math></inline-formula> - 1, which is <inline-formula> <tex-math notation="LaTeX">$\vert A \vert $ </tex-math></inline-formula> will always be smaller than <inline-formula> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula>. Experiments were performed on five various datasets, which reduced the number of data dimensions by more than 3&#x0025; on the Level 1 dataset and more than 69&#x0025; on the Level 2 dataset by maintaining the confidence value of the strong rules. In the execution time evaluated, we found an optimization of more than 2&#x0025; on the level 1 dataset and more than 94&#x0025; on the level 2 dataset.https://ieeexplore.ieee.org/document/10418933/Association rule miningSDFP-growth algorithmdimensionality reductionoptimizationFP-tree pruning
spellingShingle Boby Siswanto
Haryono Soeparno
Nesti Fronika Sianipar
Widodo Budiharto
SDFP-Growth Algorithm as a Novelty of Association Rule Mining Optimization
IEEE Access
Association rule mining
SDFP-growth algorithm
dimensionality reduction
optimization
FP-tree pruning
title SDFP-Growth Algorithm as a Novelty of Association Rule Mining Optimization
title_full SDFP-Growth Algorithm as a Novelty of Association Rule Mining Optimization
title_fullStr SDFP-Growth Algorithm as a Novelty of Association Rule Mining Optimization
title_full_unstemmed SDFP-Growth Algorithm as a Novelty of Association Rule Mining Optimization
title_short SDFP-Growth Algorithm as a Novelty of Association Rule Mining Optimization
title_sort sdfp growth algorithm as a novelty of association rule mining optimization
topic Association rule mining
SDFP-growth algorithm
dimensionality reduction
optimization
FP-tree pruning
url https://ieeexplore.ieee.org/document/10418933/
work_keys_str_mv AT bobysiswanto sdfpgrowthalgorithmasanoveltyofassociationruleminingoptimization
AT haryonosoeparno sdfpgrowthalgorithmasanoveltyofassociationruleminingoptimization
AT nestifronikasianipar sdfpgrowthalgorithmasanoveltyofassociationruleminingoptimization
AT widodobudiharto sdfpgrowthalgorithmasanoveltyofassociationruleminingoptimization