A Graph-Based Author Name Disambiguation Method and Analysis via Information Theory

Name ambiguity, due to the fact that many people share an identical name, often deteriorates the performance of information integration, document retrieval and web search. In academic data analysis, author name ambiguity usually decreases the analysis performance. To solve this problem, an author na...

Full description

Bibliographic Details
Main Authors: Yingying Ma, Youlong Wu, Chengqiang Lu
Format: Article
Language:English
Published: MDPI AG 2020-04-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/22/4/416
_version_ 1827719111804715008
author Yingying Ma
Youlong Wu
Chengqiang Lu
author_facet Yingying Ma
Youlong Wu
Chengqiang Lu
author_sort Yingying Ma
collection DOAJ
description Name ambiguity, due to the fact that many people share an identical name, often deteriorates the performance of information integration, document retrieval and web search. In academic data analysis, author name ambiguity usually decreases the analysis performance. To solve this problem, an author name disambiguation task is designed to divide documents related to an author name reference into several parts and each part is associated with a real-life person. Existing methods usually use either attributes of documents or relationships between documents and co-authors. However, methods of feature extraction using attributes cause inflexibility of models while solutions based on relationship graph network ignore the information contained in the features. In this paper, we propose a novel name disambiguation model based on representation learning which incorporates attributes and relationships. Experiments on a public real dataset demonstrate the effectiveness of our model and experimental results demonstrate that our solution is superior to several state-of-the-art graph-based methods. We also increase the interpretability of our method through information theory and show that the analysis could be helpful for model selection and training progress.
first_indexed 2024-03-10T20:36:46Z
format Article
id doaj.art-66c209ef86fc4980aa86460fca625ae8
institution Directory Open Access Journal
issn 1099-4300
language English
last_indexed 2024-03-10T20:36:46Z
publishDate 2020-04-01
publisher MDPI AG
record_format Article
series Entropy
spelling doaj.art-66c209ef86fc4980aa86460fca625ae82023-11-19T20:56:34ZengMDPI AGEntropy1099-43002020-04-0122441610.3390/e22040416A Graph-Based Author Name Disambiguation Method and Analysis via Information TheoryYingying Ma0Youlong Wu1Chengqiang Lu2School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, ChinaSchool of Information Science and Technology, ShanghaiTech University, Shanghai 201210, ChinaUniversity of Science and Technology of China, Heifei 230026, ChinaName ambiguity, due to the fact that many people share an identical name, often deteriorates the performance of information integration, document retrieval and web search. In academic data analysis, author name ambiguity usually decreases the analysis performance. To solve this problem, an author name disambiguation task is designed to divide documents related to an author name reference into several parts and each part is associated with a real-life person. Existing methods usually use either attributes of documents or relationships between documents and co-authors. However, methods of feature extraction using attributes cause inflexibility of models while solutions based on relationship graph network ignore the information contained in the features. In this paper, we propose a novel name disambiguation model based on representation learning which incorporates attributes and relationships. Experiments on a public real dataset demonstrate the effectiveness of our model and experimental results demonstrate that our solution is superior to several state-of-the-art graph-based methods. We also increase the interpretability of our method through information theory and show that the analysis could be helpful for model selection and training progress.https://www.mdpi.com/1099-4300/22/4/416name disambiguationgraph neural networkclustering analysismutual information
spellingShingle Yingying Ma
Youlong Wu
Chengqiang Lu
A Graph-Based Author Name Disambiguation Method and Analysis via Information Theory
Entropy
name disambiguation
graph neural network
clustering analysis
mutual information
title A Graph-Based Author Name Disambiguation Method and Analysis via Information Theory
title_full A Graph-Based Author Name Disambiguation Method and Analysis via Information Theory
title_fullStr A Graph-Based Author Name Disambiguation Method and Analysis via Information Theory
title_full_unstemmed A Graph-Based Author Name Disambiguation Method and Analysis via Information Theory
title_short A Graph-Based Author Name Disambiguation Method and Analysis via Information Theory
title_sort graph based author name disambiguation method and analysis via information theory
topic name disambiguation
graph neural network
clustering analysis
mutual information
url https://www.mdpi.com/1099-4300/22/4/416
work_keys_str_mv AT yingyingma agraphbasedauthornamedisambiguationmethodandanalysisviainformationtheory
AT youlongwu agraphbasedauthornamedisambiguationmethodandanalysisviainformationtheory
AT chengqianglu agraphbasedauthornamedisambiguationmethodandanalysisviainformationtheory
AT yingyingma graphbasedauthornamedisambiguationmethodandanalysisviainformationtheory
AT youlongwu graphbasedauthornamedisambiguationmethodandanalysisviainformationtheory
AT chengqianglu graphbasedauthornamedisambiguationmethodandanalysisviainformationtheory