Parallel Implementation of Apriori Algorithm Based on MapReduce

Searching frequent patterns in transactional databases is considered as one of the most important data mining problems and Apriori is one of the typical algorithms for this task. Developing fast and efficient algorithms that can handle large volumes of data becomes a challenging task due to the larg...

Full description

Bibliographic Details
Main Authors: Ning Li, Li Zeng, Qing He, Zhongzhi Shi
Format: Article
Language:English
Published: Springer 2013-04-01
Series:International Journal of Networked and Distributed Computing (IJNDC)
Subjects:
Online Access:https://www.atlantis-press.com/article/8360.pdf
Description
Summary:Searching frequent patterns in transactional databases is considered as one of the most important data mining problems and Apriori is one of the typical algorithms for this task. Developing fast and efficient algorithms that can handle large volumes of data becomes a challenging task due to the large databases. In this paper, we implement a parallel Apriori algorithm based on MapReduce, which is a framework for processing huge datasets on certain kinds of distributable problems using a large number of computers (nodes). The experimental results demonstrate that the proposed algorithm can scale well and efficiently process large datasets on commodity hardware.
ISSN:2211-7946