DW-PathSim: a distributed computing model for topic-driven weighted meta-path-based similarity measure in a large-scale content-based heterogeneous information network
From the past, several studies in the information network mining have been mainly designed for single-typed objects and links, called the homogeneous information network (HoIN). These HoIN-based approaches are definitely unsuitable for multi-typed objects and links, known as the heterogeneous inform...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Taylor & Francis Group
2019-01-01
|
Series: | Journal of Information and Telecommunication |
Subjects: | |
Online Access: | http://dx.doi.org/10.1080/24751839.2018.1516714 |
_version_ | 1818540832132694016 |
---|---|
author | Phuc Do Phu Pham |
author_facet | Phuc Do Phu Pham |
author_sort | Phuc Do |
collection | DOAJ |
description | From the past, several studies in the information network mining have been mainly designed for single-typed objects and links, called the homogeneous information network (HoIN). These HoIN-based approaches are definitely unsuitable for multi-typed objects and links, known as the heterogeneous information network (HIN). There is no doubt that most of the real-world networks are not only composed in a complex heterogeneous manner but also are extremely large in size. The big size of these networks is one of the most challenging issues that influence directly the system's performance. In this paper, our studies are mainly focused on improving the topic-driven weighted similarity measurement between same-typed objects in HIN, based on the meta-path-based mechanism, called W-PathSim. Moreover, our contributions in this paper also aim to optimize the performance of the W-PathSim model in the manner of very large-scaled HIN by combining the proposed W-PathSim model with the approach of distributed computing of ‘graph-frames’ on Spark, called DW-PathSim. The DW-PathSim not only supports in tackling the problem of weighted meta-path-based similarity searching in HINs but also the distributed computing problem on the big networked data. We test the DW-PathSim model with the real-world DBLP dataset in order to demonstrate the effectiveness of our proposed models. |
first_indexed | 2024-12-11T22:00:33Z |
format | Article |
id | doaj.art-2eb76e98fb1b426d99aa55b5bd414659 |
institution | Directory Open Access Journal |
issn | 2475-1839 2475-1847 |
language | English |
last_indexed | 2024-12-11T22:00:33Z |
publishDate | 2019-01-01 |
publisher | Taylor & Francis Group |
record_format | Article |
series | Journal of Information and Telecommunication |
spelling | doaj.art-2eb76e98fb1b426d99aa55b5bd4146592022-12-22T00:49:09ZengTaylor & Francis GroupJournal of Information and Telecommunication2475-18392475-18472019-01-0131193810.1080/24751839.2018.15167141516714DW-PathSim: a distributed computing model for topic-driven weighted meta-path-based similarity measure in a large-scale content-based heterogeneous information networkPhuc Do0Phu Pham1University of Information Technology (UIT), VNU-HCMUniversity of Information Technology (UIT), VNU-HCMFrom the past, several studies in the information network mining have been mainly designed for single-typed objects and links, called the homogeneous information network (HoIN). These HoIN-based approaches are definitely unsuitable for multi-typed objects and links, known as the heterogeneous information network (HIN). There is no doubt that most of the real-world networks are not only composed in a complex heterogeneous manner but also are extremely large in size. The big size of these networks is one of the most challenging issues that influence directly the system's performance. In this paper, our studies are mainly focused on improving the topic-driven weighted similarity measurement between same-typed objects in HIN, based on the meta-path-based mechanism, called W-PathSim. Moreover, our contributions in this paper also aim to optimize the performance of the W-PathSim model in the manner of very large-scaled HIN by combining the proposed W-PathSim model with the approach of distributed computing of ‘graph-frames’ on Spark, called DW-PathSim. The DW-PathSim not only supports in tackling the problem of weighted meta-path-based similarity searching in HINs but also the distributed computing problem on the big networked data. We test the DW-PathSim model with the real-world DBLP dataset in order to demonstrate the effectiveness of our proposed models.http://dx.doi.org/10.1080/24751839.2018.1516714HINlarge-scale HINcontent-based HINsimilarity measureSpark |
spellingShingle | Phuc Do Phu Pham DW-PathSim: a distributed computing model for topic-driven weighted meta-path-based similarity measure in a large-scale content-based heterogeneous information network Journal of Information and Telecommunication HIN large-scale HIN content-based HIN similarity measure Spark |
title | DW-PathSim: a distributed computing model for topic-driven weighted meta-path-based similarity measure in a large-scale content-based heterogeneous information network |
title_full | DW-PathSim: a distributed computing model for topic-driven weighted meta-path-based similarity measure in a large-scale content-based heterogeneous information network |
title_fullStr | DW-PathSim: a distributed computing model for topic-driven weighted meta-path-based similarity measure in a large-scale content-based heterogeneous information network |
title_full_unstemmed | DW-PathSim: a distributed computing model for topic-driven weighted meta-path-based similarity measure in a large-scale content-based heterogeneous information network |
title_short | DW-PathSim: a distributed computing model for topic-driven weighted meta-path-based similarity measure in a large-scale content-based heterogeneous information network |
title_sort | dw pathsim a distributed computing model for topic driven weighted meta path based similarity measure in a large scale content based heterogeneous information network |
topic | HIN large-scale HIN content-based HIN similarity measure Spark |
url | http://dx.doi.org/10.1080/24751839.2018.1516714 |
work_keys_str_mv | AT phucdo dwpathsimadistributedcomputingmodelfortopicdrivenweightedmetapathbasedsimilaritymeasureinalargescalecontentbasedheterogeneousinformationnetwork AT phupham dwpathsimadistributedcomputingmodelfortopicdrivenweightedmetapathbasedsimilaritymeasureinalargescalecontentbasedheterogeneousinformationnetwork |