Summary: | This study investigates unsupervised measures for graph homophily, a feature that indicates the degree to which homologous nodes are connected in a network. Traditional studies frequently use labels to quantify homophily, however in many real-world circumstances, these labels may not be accessible. As a result, we present several unsupervised approaches for measuring homophily that do not require labels.
Our proposed methods include: (1) calculating raw feature similarity across all nodes and selecting edges based on a threshold, calculating similarity of learned representations across all nodes and selecting edges based on a threshold, where the representations are learned using (2) Graph Auto-encoder and (3) Graph Attention Model without using labels, and (4) using unsupervised graph clustering and evaluating graph homophily based on the clustering results.
To evaluate the efficiency of these unsupervised measures, we propose two evaluation perspectives: edge homophily and node homophily. For edge homophily, we first choose all edges and then use true labels to determine if an edge is homophilic or non-homophilic. For node homophily, we first calculate true labels, then unsupervised node homophily, and lastly correlation. This comprehensive evaluation allows us to understand the strengths and weaknesses of each unsupervised measurement, as well as insights into optimal methods for evaluating graph homophily in an unsupervised setting.
|