Summary: | With the advent of the age of big data, people can collect rich and diverse data from a wide variety of collection devices, such as those of the Internet of Things. Knowledge hidden in large data is very useful and valuable. Frequent pattern mining, as a basic method of data mining, is applied to every aspect of society. However, the application of traditional frequent pattern mining methods to big data involves bottlenecks due to the large number of result sets. Such bottlenecks make it difficult to produce practical value in production and life. Therefore, mining representative pattern sets has been proposed. However, most existing algorithms select representative patterns after mining frequent pattern sets. This framework can make the runtime difficult to evaluate in large data environments. To solve the above-mentioned problems, this paper presents an online representative pattern-set parallel-mining algorithm. Within the parallel MapReduce framework, this algorithm uses horizontal segmentation to process the database and then applies the online mining algorithm to mine the locally represented pattern sets on each small database. Finally, several performance optimization strategies are proposed. As shown by numerous experiments on the actual dataset, the algorithm proposed in this paper improves the time efficiency by one order of magnitude. Several optimization strategies reduce the execution time to varying degrees.
|