SDFP-Growth Algorithm as a Novelty of Association Rule Mining Optimization

An essential element of association rules is the strong confidence values that depend on the support value threshold, which determines the optimum number of datasets. The existing method for determining the support value threshold is carried out manually by trial and error; the user determines a sup...

Full description

Bibliographic Details
Main Authors: Boby Siswanto, Haryono Soeparno, Nesti Fronika Sianipar, Widodo Budiharto
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10418933/
Description
Summary:An essential element of association rules is the strong confidence values that depend on the support value threshold, which determines the optimum number of datasets. The existing method for determining the support value threshold is carried out manually by trial and error; the user determines a support value such as 10&#x0025;, 30&#x0025;, or 60&#x0025; according to their instincts. If the support value threshold is inappropriate, it produces useless frequent patterns, overburdens computer resources, and wastes time. The formula for predicting the maximum count of frequent patterns was 2n &#x2013; 1, where <inline-formula> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula> is the number of distinct items in the dataset. This paper proposes a new SDFP-growth algorithm that does not require manual determination of the support threshold value. The SDFP-growth algorithm will perform dimensionality reduction on the original dataset that will generate level 1 and level 2 smaller datasets, thus automatically producing a dataset with an optimum amount of data with a minimum support value threshold. The proposed formula for predicting the maximum number of frequent patterns will become 2<inline-formula> <tex-math notation="LaTeX">$^{\vert A\vert }$ </tex-math></inline-formula> - 1, which is <inline-formula> <tex-math notation="LaTeX">$\vert A \vert $ </tex-math></inline-formula> will always be smaller than <inline-formula> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula>. Experiments were performed on five various datasets, which reduced the number of data dimensions by more than 3&#x0025; on the Level 1 dataset and more than 69&#x0025; on the Level 2 dataset by maintaining the confidence value of the strong rules. In the execution time evaluated, we found an optimization of more than 2&#x0025; on the level 1 dataset and more than 94&#x0025; on the level 2 dataset.
ISSN:2169-3536