MISFP-Growth: Hadoop-Based Frequent Pattern Mining with Multiple Item Support

In practice, single item support cannot comprehensively address the complexity of items in large datasets. In this study, we propose a big data analytics framework (named Multiple Item Support Frequent Patterns, MISFP-growth algorithm) that uses Hadoop-based parallel computing to achieve high-effici...

Full description

Bibliographic Details
Main Authors:	Chen-Shu Wang, Jui-Yen Chang
Format:	Article
Language:	English
Published:	MDPI AG 2019-05-01
Series:	Applied Sciences
Subjects:	big data analytics Hadoop MapReduce parallel computing frequent pattern discovery multiple item support
Online Access:	https://www.mdpi.com/2076-3417/9/10/2075

_version_	1828399816008269824
author	Chen-Shu Wang Jui-Yen Chang
author_facet	Chen-Shu Wang Jui-Yen Chang
author_sort	Chen-Shu Wang
collection	DOAJ
description	In practice, single item support cannot comprehensively address the complexity of items in large datasets. In this study, we propose a big data analytics framework (named Multiple Item Support Frequent Patterns, MISFP-growth algorithm) that uses Hadoop-based parallel computing to achieve high-efficiency mining of itemsets with multiple item supports (MIS). The proposed architecture consists of two phases. First, in the counting support phase, a Hadoop MapReduce architecture is employed to determine the support for each item. Next, in the analytics phase, sub-transaction blocks are generated according to MIS and the MISFP-growth algorithm identifies the frequency of patterns. To facilitate decision makers in setting MIS, we also propose the concept of classification of item (COI), which classifies items of higher homogeneity into the same class, by which the items inherit class support as their item support. Three experiments were implemented to validate the proposed Hadoop-based MISFP-growth algorithm. The experimental results show approximately 38% reduction in the execution time on parallel architectures. The proposed MISFP-growth algorithm can be implemented on the distributed computing framework. Furthermore, according to the experimental results, the enhanced performance of the proposed algorithm indicates that it could have big data analytics applications.
first_indexed	2024-12-10T09:22:57Z
format	Article
id	doaj.art-435801cb32ad413994eff7a1346cbeab
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-12-10T09:22:57Z
publishDate	2019-05-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-435801cb32ad413994eff7a1346cbeab2022-12-22T01:54:37ZengMDPI AGApplied Sciences2076-34172019-05-01910207510.3390/app9102075app9102075MISFP-Growth: Hadoop-Based Frequent Pattern Mining with Multiple Item SupportChen-Shu Wang0Jui-Yen Chang1Department of Information and Finance Management, National Taipei University of Technology, Taipei 10608, TaiwanDepartment of Management Information System, National Chengchi University, Taipei 11605, TaiwanIn practice, single item support cannot comprehensively address the complexity of items in large datasets. In this study, we propose a big data analytics framework (named Multiple Item Support Frequent Patterns, MISFP-growth algorithm) that uses Hadoop-based parallel computing to achieve high-efficiency mining of itemsets with multiple item supports (MIS). The proposed architecture consists of two phases. First, in the counting support phase, a Hadoop MapReduce architecture is employed to determine the support for each item. Next, in the analytics phase, sub-transaction blocks are generated according to MIS and the MISFP-growth algorithm identifies the frequency of patterns. To facilitate decision makers in setting MIS, we also propose the concept of classification of item (COI), which classifies items of higher homogeneity into the same class, by which the items inherit class support as their item support. Three experiments were implemented to validate the proposed Hadoop-based MISFP-growth algorithm. The experimental results show approximately 38% reduction in the execution time on parallel architectures. The proposed MISFP-growth algorithm can be implemented on the distributed computing framework. Furthermore, according to the experimental results, the enhanced performance of the proposed algorithm indicates that it could have big data analytics applications.https://www.mdpi.com/2076-3417/9/10/2075big data analyticsHadoop MapReduce parallel computingfrequent pattern discoverymultiple item support
spellingShingle	Chen-Shu Wang Jui-Yen Chang MISFP-Growth: Hadoop-Based Frequent Pattern Mining with Multiple Item Support Applied Sciences big data analytics Hadoop MapReduce parallel computing frequent pattern discovery multiple item support
title	MISFP-Growth: Hadoop-Based Frequent Pattern Mining with Multiple Item Support
title_full	MISFP-Growth: Hadoop-Based Frequent Pattern Mining with Multiple Item Support
title_fullStr	MISFP-Growth: Hadoop-Based Frequent Pattern Mining with Multiple Item Support
title_full_unstemmed	MISFP-Growth: Hadoop-Based Frequent Pattern Mining with Multiple Item Support
title_short	MISFP-Growth: Hadoop-Based Frequent Pattern Mining with Multiple Item Support
title_sort	misfp growth hadoop based frequent pattern mining with multiple item support
topic	big data analytics Hadoop MapReduce parallel computing frequent pattern discovery multiple item support
url	https://www.mdpi.com/2076-3417/9/10/2075
work_keys_str_mv	AT chenshuwang misfpgrowthhadoopbasedfrequentpatternminingwithmultipleitemsupport AT juiyenchang misfpgrowthhadoopbasedfrequentpatternminingwithmultipleitemsupport

MISFP-Growth: Hadoop-Based Frequent Pattern Mining with Multiple Item Support

Similar Items