DW-PathSim: a distributed computing model for topic-driven weighted meta-path-based similarity measure in a large-scale content-based heterogeneous information network

From the past, several studies in the information network mining have been mainly designed for single-typed objects and links, called the homogeneous information network (HoIN). These HoIN-based approaches are definitely unsuitable for multi-typed objects and links, known as the heterogeneous inform...

Full description

Bibliographic Details
Main Authors: Phuc Do, Phu Pham
Format: Article
Language:English
Published: Taylor & Francis Group 2019-01-01
Series:Journal of Information and Telecommunication
Subjects:
Online Access:http://dx.doi.org/10.1080/24751839.2018.1516714
Description
Summary:From the past, several studies in the information network mining have been mainly designed for single-typed objects and links, called the homogeneous information network (HoIN). These HoIN-based approaches are definitely unsuitable for multi-typed objects and links, known as the heterogeneous information network (HIN). There is no doubt that most of the real-world networks are not only composed in a complex heterogeneous manner but also are extremely large in size. The big size of these networks is one of the most challenging issues that influence directly the system's performance. In this paper, our studies are mainly focused on improving the topic-driven weighted similarity measurement between same-typed objects in HIN, based on the meta-path-based mechanism, called W-PathSim. Moreover, our contributions in this paper also aim to optimize the performance of the W-PathSim model in the manner of very large-scaled HIN by combining the proposed W-PathSim model with the approach of distributed computing of ‘graph-frames’ on Spark, called DW-PathSim. The DW-PathSim not only supports in tackling the problem of weighted meta-path-based similarity searching in HINs but also the distributed computing problem on the big networked data. We test the DW-PathSim model with the real-world DBLP dataset in order to demonstrate the effectiveness of our proposed models.
ISSN:2475-1839
2475-1847