DW-PathSim: a distributed computing model for topic-driven weighted meta-path-based similarity measure in a large-scale content-based heterogeneous information network

From the past, several studies in the information network mining have been mainly designed for single-typed objects and links, called the homogeneous information network (HoIN). These HoIN-based approaches are definitely unsuitable for multi-typed objects and links, known as the heterogeneous inform...

Full description

Bibliographic Details
Main Authors: Phuc Do, Phu Pham
Format: Article
Language:English
Published: Taylor & Francis Group 2019-01-01
Series:Journal of Information and Telecommunication
Subjects:
Online Access:http://dx.doi.org/10.1080/24751839.2018.1516714
_version_ 1818540832132694016
author Phuc Do
Phu Pham
author_facet Phuc Do
Phu Pham
author_sort Phuc Do
collection DOAJ
description From the past, several studies in the information network mining have been mainly designed for single-typed objects and links, called the homogeneous information network (HoIN). These HoIN-based approaches are definitely unsuitable for multi-typed objects and links, known as the heterogeneous information network (HIN). There is no doubt that most of the real-world networks are not only composed in a complex heterogeneous manner but also are extremely large in size. The big size of these networks is one of the most challenging issues that influence directly the system's performance. In this paper, our studies are mainly focused on improving the topic-driven weighted similarity measurement between same-typed objects in HIN, based on the meta-path-based mechanism, called W-PathSim. Moreover, our contributions in this paper also aim to optimize the performance of the W-PathSim model in the manner of very large-scaled HIN by combining the proposed W-PathSim model with the approach of distributed computing of ‘graph-frames’ on Spark, called DW-PathSim. The DW-PathSim not only supports in tackling the problem of weighted meta-path-based similarity searching in HINs but also the distributed computing problem on the big networked data. We test the DW-PathSim model with the real-world DBLP dataset in order to demonstrate the effectiveness of our proposed models.
first_indexed 2024-12-11T22:00:33Z
format Article
id doaj.art-2eb76e98fb1b426d99aa55b5bd414659
institution Directory Open Access Journal
issn 2475-1839
2475-1847
language English
last_indexed 2024-12-11T22:00:33Z
publishDate 2019-01-01
publisher Taylor & Francis Group
record_format Article
series Journal of Information and Telecommunication
spelling doaj.art-2eb76e98fb1b426d99aa55b5bd4146592022-12-22T00:49:09ZengTaylor & Francis GroupJournal of Information and Telecommunication2475-18392475-18472019-01-0131193810.1080/24751839.2018.15167141516714DW-PathSim: a distributed computing model for topic-driven weighted meta-path-based similarity measure in a large-scale content-based heterogeneous information networkPhuc Do0Phu Pham1University of Information Technology (UIT), VNU-HCMUniversity of Information Technology (UIT), VNU-HCMFrom the past, several studies in the information network mining have been mainly designed for single-typed objects and links, called the homogeneous information network (HoIN). These HoIN-based approaches are definitely unsuitable for multi-typed objects and links, known as the heterogeneous information network (HIN). There is no doubt that most of the real-world networks are not only composed in a complex heterogeneous manner but also are extremely large in size. The big size of these networks is one of the most challenging issues that influence directly the system's performance. In this paper, our studies are mainly focused on improving the topic-driven weighted similarity measurement between same-typed objects in HIN, based on the meta-path-based mechanism, called W-PathSim. Moreover, our contributions in this paper also aim to optimize the performance of the W-PathSim model in the manner of very large-scaled HIN by combining the proposed W-PathSim model with the approach of distributed computing of ‘graph-frames’ on Spark, called DW-PathSim. The DW-PathSim not only supports in tackling the problem of weighted meta-path-based similarity searching in HINs but also the distributed computing problem on the big networked data. We test the DW-PathSim model with the real-world DBLP dataset in order to demonstrate the effectiveness of our proposed models.http://dx.doi.org/10.1080/24751839.2018.1516714HINlarge-scale HINcontent-based HINsimilarity measureSpark
spellingShingle Phuc Do
Phu Pham
DW-PathSim: a distributed computing model for topic-driven weighted meta-path-based similarity measure in a large-scale content-based heterogeneous information network
Journal of Information and Telecommunication
HIN
large-scale HIN
content-based HIN
similarity measure
Spark
title DW-PathSim: a distributed computing model for topic-driven weighted meta-path-based similarity measure in a large-scale content-based heterogeneous information network
title_full DW-PathSim: a distributed computing model for topic-driven weighted meta-path-based similarity measure in a large-scale content-based heterogeneous information network
title_fullStr DW-PathSim: a distributed computing model for topic-driven weighted meta-path-based similarity measure in a large-scale content-based heterogeneous information network
title_full_unstemmed DW-PathSim: a distributed computing model for topic-driven weighted meta-path-based similarity measure in a large-scale content-based heterogeneous information network
title_short DW-PathSim: a distributed computing model for topic-driven weighted meta-path-based similarity measure in a large-scale content-based heterogeneous information network
title_sort dw pathsim a distributed computing model for topic driven weighted meta path based similarity measure in a large scale content based heterogeneous information network
topic HIN
large-scale HIN
content-based HIN
similarity measure
Spark
url http://dx.doi.org/10.1080/24751839.2018.1516714
work_keys_str_mv AT phucdo dwpathsimadistributedcomputingmodelfortopicdrivenweightedmetapathbasedsimilaritymeasureinalargescalecontentbasedheterogeneousinformationnetwork
AT phupham dwpathsimadistributedcomputingmodelfortopicdrivenweightedmetapathbasedsimilaritymeasureinalargescalecontentbasedheterogeneousinformationnetwork