A Parallel Approach for Frequent Subgraph Mining in a Single Large Graph Using Spark

Frequent subgraph mining (FSM) plays an important role in graph mining, attracting a great deal of attention in many areas, such as bioinformatics, web data mining and social networks. In this paper, we propose SSiGraM (Spark based Single Graph Mining), a Spark based parallel frequent subgraph minin...

Πλήρης περιγραφή

Λεπτομέρειες βιβλιογραφικής εγγραφής
Κύριοι συγγραφείς: Fengcai Qiao, Xin Zhang, Pei Li, Zhaoyun Ding, Shanshan Jia, Hui Wang
Μορφή: Άρθρο
Γλώσσα:English
Έκδοση: MDPI AG 2018-02-01
Σειρά:Applied Sciences
Θέματα:
Διαθέσιμο Online:http://www.mdpi.com/2076-3417/8/2/230
Περιγραφή
Περίληψη:Frequent subgraph mining (FSM) plays an important role in graph mining, attracting a great deal of attention in many areas, such as bioinformatics, web data mining and social networks. In this paper, we propose SSiGraM (Spark based Single Graph Mining), a Spark based parallel frequent subgraph mining algorithm in a single large graph. Aiming to approach the two computational challenges of FSM, we conduct the subgraph extension and support evaluation parallel across all the distributed cluster worker nodes. In addition, we also employ a heuristic search strategy and three novel optimizations: load balancing, pre-search pruning and top-down pruning in the support evaluation process, which significantly improve the performance. Extensive experiments with four different real-world datasets demonstrate that the proposed algorithm outperforms the existing GraMi (Graph Mining) algorithm by an order of magnitude for all datasets and can work with a lower support threshold.
ISSN:2076-3417