CoarSAS2hvec: Heterogeneous Information Network Embedding with Balanced Network Sampling

Heterogeneous information network (HIN) embedding is an important tool for tasks such as node classification, community detection, and recommendation. It aims to find the representations of nodes that preserve the proximity between entities of different nature. A family of approaches that are widely...

Full description

Bibliographic Details
Main Authors: Ling Zhan, Tao Jia
Format: Article
Language:English
Published: MDPI AG 2022-02-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/24/2/276
_version_ 1797480572376842240
author Ling Zhan
Tao Jia
author_facet Ling Zhan
Tao Jia
author_sort Ling Zhan
collection DOAJ
description Heterogeneous information network (HIN) embedding is an important tool for tasks such as node classification, community detection, and recommendation. It aims to find the representations of nodes that preserve the proximity between entities of different nature. A family of approaches that are widely adopted applies random walk to generate a sequence of heterogeneous contexts, from which, the embedding is learned. However, due to the multipartite graph structure of HIN, hub nodes tend to be over-represented to their context in the sampled sequence, giving rise to imbalanced samples of the network. Here, we propose a new embedding method: CoarSAS2hvec. The self-avoiding short sequence sampling with the HIN coarsening procedure (CoarSAS) is utilized to better collect the rich information in HIN. An optimized loss function is used to improve the performance of the HIN structure embedding. CoarSAS2hvec outperforms nine other methods in node classification and community detection on four real-world data sets. Using entropy as a measure of the amount of information, we confirm that CoarSAS catches richer information of the network compared with that through other methods. Hence, the traditional loss function applied to samples by CoarSAS can also yield improved results. Our work addresses a limitation of the random-walk-based HIN embedding that has not been emphasized before, which can shed light on a range of problems in HIN analyses.
first_indexed 2024-03-09T22:01:59Z
format Article
id doaj.art-d328d94264ac44b0927b6a95da54efa1
institution Directory Open Access Journal
issn 1099-4300
language English
last_indexed 2024-03-09T22:01:59Z
publishDate 2022-02-01
publisher MDPI AG
record_format Article
series Entropy
spelling doaj.art-d328d94264ac44b0927b6a95da54efa12023-11-23T19:48:47ZengMDPI AGEntropy1099-43002022-02-0124227610.3390/e24020276CoarSAS2hvec: Heterogeneous Information Network Embedding with Balanced Network SamplingLing Zhan0Tao Jia1College of Computer and Information Science, Southwest University, Chongqing 400715, ChinaCollege of Computer and Information Science, Southwest University, Chongqing 400715, ChinaHeterogeneous information network (HIN) embedding is an important tool for tasks such as node classification, community detection, and recommendation. It aims to find the representations of nodes that preserve the proximity between entities of different nature. A family of approaches that are widely adopted applies random walk to generate a sequence of heterogeneous contexts, from which, the embedding is learned. However, due to the multipartite graph structure of HIN, hub nodes tend to be over-represented to their context in the sampled sequence, giving rise to imbalanced samples of the network. Here, we propose a new embedding method: CoarSAS2hvec. The self-avoiding short sequence sampling with the HIN coarsening procedure (CoarSAS) is utilized to better collect the rich information in HIN. An optimized loss function is used to improve the performance of the HIN structure embedding. CoarSAS2hvec outperforms nine other methods in node classification and community detection on four real-world data sets. Using entropy as a measure of the amount of information, we confirm that CoarSAS catches richer information of the network compared with that through other methods. Hence, the traditional loss function applied to samples by CoarSAS can also yield improved results. Our work addresses a limitation of the random-walk-based HIN embedding that has not been emphasized before, which can shed light on a range of problems in HIN analyses.https://www.mdpi.com/1099-4300/24/2/276heterogeneous information networksnetwork embeddingcontext samplingrandom walkinformation entropy
spellingShingle Ling Zhan
Tao Jia
CoarSAS2hvec: Heterogeneous Information Network Embedding with Balanced Network Sampling
Entropy
heterogeneous information networks
network embedding
context sampling
random walk
information entropy
title CoarSAS2hvec: Heterogeneous Information Network Embedding with Balanced Network Sampling
title_full CoarSAS2hvec: Heterogeneous Information Network Embedding with Balanced Network Sampling
title_fullStr CoarSAS2hvec: Heterogeneous Information Network Embedding with Balanced Network Sampling
title_full_unstemmed CoarSAS2hvec: Heterogeneous Information Network Embedding with Balanced Network Sampling
title_short CoarSAS2hvec: Heterogeneous Information Network Embedding with Balanced Network Sampling
title_sort coarsas2hvec heterogeneous information network embedding with balanced network sampling
topic heterogeneous information networks
network embedding
context sampling
random walk
information entropy
url https://www.mdpi.com/1099-4300/24/2/276
work_keys_str_mv AT lingzhan coarsas2hvecheterogeneousinformationnetworkembeddingwithbalancednetworksampling
AT taojia coarsas2hvecheterogeneousinformationnetworkembeddingwithbalancednetworksampling