Mining Top-k Frequent Patterns in Large Geosocial Networks: A Mnie-Based Extension Approach

Frequent pattern mining (FPM) has played an important role in many graph domains, such as bioinformatics and social networks. In this paper, we focus on geo-social graphs, a kind of social network augmented by geographical information. However, in addition to the exponential time complexity of the p...

Full description

Bibliographic Details
Main Authors: Changben Zhou, Jian Xu, Ming Jiang, Donghang Tang, Sheng Wang
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10073555/
Description
Summary:Frequent pattern mining (FPM) has played an important role in many graph domains, such as bioinformatics and social networks. In this paper, we focus on geo-social graphs, a kind of social network augmented by geographical information. However, in addition to the exponential time complexity of the problem, we face the challenge of efficient subgraph retrieval since we are interested in patterns in a specific region in such a network. For this reason, we formulate the top-<inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula> FPM problem in large geo-social networks. Specifically, we devise a novel framework for subgraph retrieval and FPM mining with a series of optimizations. First, we propose a neighboring-aware R-tree (NaR-Tree) index structure to alleviate the challenge of retrieving subgraphs from a large graph. NaR-Tree is a variant of R-tree in which each nonleaf tree node further maintains some edge statistics information for the rectangle related to it. Second, we define the concept of minimum image-based support of edges (MNIE). With the help of the NaR-Tree and MNIE-based pattern extension approach, a mining algorithm that addresses the problem of exponential candidate patterns is proposed. We also present a lazy retrieval strategy to reduce the frequency of subgraph retrieval. Finally, we adopt an edge sampling approach to further accelerate the mining process. Extensive experiments on real-world and synthesized datasets are conducted to demonstrate the effectiveness and efficiency of our solution.
ISSN:2169-3536