A Distributed Method for Fast Mining Frequent Patterns From Big Data

In recent years, knowledge discovery in databases provides a powerful capability to discover meaningful and useful information. For numerous real-life applications, frequent pattern mining and association rule mining have been extensively studied. In traditional mining algorithms, data are centraliz...

Full description

Bibliographic Details
Main Authors:	Peng-Yu Huang, Wan-Shu Cheng, Ju-Chin Chen, Wen-Yu Chung, Young-Lin Chen, Kawuu W. Lin
Format:	Article
Language:	English
Published:	IEEE 2021-01-01
Series:	IEEE Access
Subjects:	Data mining parallel algorithms distributed computing
Online Access:	https://ieeexplore.ieee.org/document/9548089/

_version_	1818833379595911168
author	Peng-Yu Huang Wan-Shu Cheng Ju-Chin Chen Wen-Yu Chung Young-Lin Chen Kawuu W. Lin
author_facet	Peng-Yu Huang Wan-Shu Cheng Ju-Chin Chen Wen-Yu Chung Young-Lin Chen Kawuu W. Lin
author_sort	Peng-Yu Huang
collection	DOAJ
description	In recent years, knowledge discovery in databases provides a powerful capability to discover meaningful and useful information. For numerous real-life applications, frequent pattern mining and association rule mining have been extensively studied. In traditional mining algorithms, data are centralized and memory-resident. As a result of the large amount of data, bandwidth limitation, and energy limitations when applying these methods to distributed databases, especially in this era of big data, the performance is not effective enough. Hence, data mining on distributed environments has emerged as an important research area. To improve the performance, we propose a set of algorithms based on FP growth that discover FPs that are capable of providing fast and scalable service in distributed computing environments and a brief data structure to store items and counts to minimize the data for transmission on the network. To ensure completeness and execution capability, DistEclat and BigFIM were considered for the experiment comparison. Experiments show that the proposed method has superior cost-effectiveness for processing massive datasets and good capabilities under various experiment conditions. The proposed method on average required only 33% of the execution time and 45% of the transmission cost of DistEclat. Compared to BigFIM, The proposed method on average required 23.3% of the execution time and 14.2% of the transmission cost of BigFIM.
first_indexed	2024-12-19T02:17:59Z
format	Article
id	doaj.art-14dac2691ed942c89126e46cb2bb3fcf
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-12-19T02:17:59Z
publishDate	2021-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-14dac2691ed942c89126e46cb2bb3fcf2022-12-21T20:40:21ZengIEEEIEEE Access2169-35362021-01-01913514413515910.1109/ACCESS.2021.31155149548089A Distributed Method for Fast Mining Frequent Patterns From Big DataPeng-Yu Huang0https://orcid.org/0000-0001-7126-8096Wan-Shu Cheng1Ju-Chin Chen2Wen-Yu Chung3Young-Lin Chen4Kawuu W. Lin5https://orcid.org/0000-0002-1669-1008Department of Computer Science and Information Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, TaiwanDepartment of Electrical Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, TaiwanDepartment of Computer Science and Information Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, TaiwanDepartment of Computer Science and Information Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, TaiwanFoxconn Technology Group, Taipei, TaiwanDepartment of Computer Science and Information Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, TaiwanIn recent years, knowledge discovery in databases provides a powerful capability to discover meaningful and useful information. For numerous real-life applications, frequent pattern mining and association rule mining have been extensively studied. In traditional mining algorithms, data are centralized and memory-resident. As a result of the large amount of data, bandwidth limitation, and energy limitations when applying these methods to distributed databases, especially in this era of big data, the performance is not effective enough. Hence, data mining on distributed environments has emerged as an important research area. To improve the performance, we propose a set of algorithms based on FP growth that discover FPs that are capable of providing fast and scalable service in distributed computing environments and a brief data structure to store items and counts to minimize the data for transmission on the network. To ensure completeness and execution capability, DistEclat and BigFIM were considered for the experiment comparison. Experiments show that the proposed method has superior cost-effectiveness for processing massive datasets and good capabilities under various experiment conditions. The proposed method on average required only 33% of the execution time and 45% of the transmission cost of DistEclat. Compared to BigFIM, The proposed method on average required 23.3% of the execution time and 14.2% of the transmission cost of BigFIM.https://ieeexplore.ieee.org/document/9548089/Data miningparallel algorithmsdistributed computing
spellingShingle	Peng-Yu Huang Wan-Shu Cheng Ju-Chin Chen Wen-Yu Chung Young-Lin Chen Kawuu W. Lin A Distributed Method for Fast Mining Frequent Patterns From Big Data IEEE Access Data mining parallel algorithms distributed computing
title	A Distributed Method for Fast Mining Frequent Patterns From Big Data
title_full	A Distributed Method for Fast Mining Frequent Patterns From Big Data
title_fullStr	A Distributed Method for Fast Mining Frequent Patterns From Big Data
title_full_unstemmed	A Distributed Method for Fast Mining Frequent Patterns From Big Data
title_short	A Distributed Method for Fast Mining Frequent Patterns From Big Data
title_sort	distributed method for fast mining frequent patterns from big data
topic	Data mining parallel algorithms distributed computing
url	https://ieeexplore.ieee.org/document/9548089/
work_keys_str_mv	AT pengyuhuang adistributedmethodforfastminingfrequentpatternsfrombigdata AT wanshucheng adistributedmethodforfastminingfrequentpatternsfrombigdata AT juchinchen adistributedmethodforfastminingfrequentpatternsfrombigdata AT wenyuchung adistributedmethodforfastminingfrequentpatternsfrombigdata AT younglinchen adistributedmethodforfastminingfrequentpatternsfrombigdata AT kawuuwlin adistributedmethodforfastminingfrequentpatternsfrombigdata AT pengyuhuang distributedmethodforfastminingfrequentpatternsfrombigdata AT wanshucheng distributedmethodforfastminingfrequentpatternsfrombigdata AT juchinchen distributedmethodforfastminingfrequentpatternsfrombigdata AT wenyuchung distributedmethodforfastminingfrequentpatternsfrombigdata AT younglinchen distributedmethodforfastminingfrequentpatternsfrombigdata AT kawuuwlin distributedmethodforfastminingfrequentpatternsfrombigdata

A Distributed Method for Fast Mining Frequent Patterns From Big Data

Similar Items