A Graph-Based Author Name Disambiguation Method and Analysis via Information Theory
Name ambiguity, due to the fact that many people share an identical name, often deteriorates the performance of information integration, document retrieval and web search. In academic data analysis, author name ambiguity usually decreases the analysis performance. To solve this problem, an author na...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-04-01
|
Series: | Entropy |
Subjects: | |
Online Access: | https://www.mdpi.com/1099-4300/22/4/416 |
_version_ | 1827719111804715008 |
---|---|
author | Yingying Ma Youlong Wu Chengqiang Lu |
author_facet | Yingying Ma Youlong Wu Chengqiang Lu |
author_sort | Yingying Ma |
collection | DOAJ |
description | Name ambiguity, due to the fact that many people share an identical name, often deteriorates the performance of information integration, document retrieval and web search. In academic data analysis, author name ambiguity usually decreases the analysis performance. To solve this problem, an author name disambiguation task is designed to divide documents related to an author name reference into several parts and each part is associated with a real-life person. Existing methods usually use either attributes of documents or relationships between documents and co-authors. However, methods of feature extraction using attributes cause inflexibility of models while solutions based on relationship graph network ignore the information contained in the features. In this paper, we propose a novel name disambiguation model based on representation learning which incorporates attributes and relationships. Experiments on a public real dataset demonstrate the effectiveness of our model and experimental results demonstrate that our solution is superior to several state-of-the-art graph-based methods. We also increase the interpretability of our method through information theory and show that the analysis could be helpful for model selection and training progress. |
first_indexed | 2024-03-10T20:36:46Z |
format | Article |
id | doaj.art-66c209ef86fc4980aa86460fca625ae8 |
institution | Directory Open Access Journal |
issn | 1099-4300 |
language | English |
last_indexed | 2024-03-10T20:36:46Z |
publishDate | 2020-04-01 |
publisher | MDPI AG |
record_format | Article |
series | Entropy |
spelling | doaj.art-66c209ef86fc4980aa86460fca625ae82023-11-19T20:56:34ZengMDPI AGEntropy1099-43002020-04-0122441610.3390/e22040416A Graph-Based Author Name Disambiguation Method and Analysis via Information TheoryYingying Ma0Youlong Wu1Chengqiang Lu2School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, ChinaSchool of Information Science and Technology, ShanghaiTech University, Shanghai 201210, ChinaUniversity of Science and Technology of China, Heifei 230026, ChinaName ambiguity, due to the fact that many people share an identical name, often deteriorates the performance of information integration, document retrieval and web search. In academic data analysis, author name ambiguity usually decreases the analysis performance. To solve this problem, an author name disambiguation task is designed to divide documents related to an author name reference into several parts and each part is associated with a real-life person. Existing methods usually use either attributes of documents or relationships between documents and co-authors. However, methods of feature extraction using attributes cause inflexibility of models while solutions based on relationship graph network ignore the information contained in the features. In this paper, we propose a novel name disambiguation model based on representation learning which incorporates attributes and relationships. Experiments on a public real dataset demonstrate the effectiveness of our model and experimental results demonstrate that our solution is superior to several state-of-the-art graph-based methods. We also increase the interpretability of our method through information theory and show that the analysis could be helpful for model selection and training progress.https://www.mdpi.com/1099-4300/22/4/416name disambiguationgraph neural networkclustering analysismutual information |
spellingShingle | Yingying Ma Youlong Wu Chengqiang Lu A Graph-Based Author Name Disambiguation Method and Analysis via Information Theory Entropy name disambiguation graph neural network clustering analysis mutual information |
title | A Graph-Based Author Name Disambiguation Method and Analysis via Information Theory |
title_full | A Graph-Based Author Name Disambiguation Method and Analysis via Information Theory |
title_fullStr | A Graph-Based Author Name Disambiguation Method and Analysis via Information Theory |
title_full_unstemmed | A Graph-Based Author Name Disambiguation Method and Analysis via Information Theory |
title_short | A Graph-Based Author Name Disambiguation Method and Analysis via Information Theory |
title_sort | graph based author name disambiguation method and analysis via information theory |
topic | name disambiguation graph neural network clustering analysis mutual information |
url | https://www.mdpi.com/1099-4300/22/4/416 |
work_keys_str_mv | AT yingyingma agraphbasedauthornamedisambiguationmethodandanalysisviainformationtheory AT youlongwu agraphbasedauthornamedisambiguationmethodandanalysisviainformationtheory AT chengqianglu agraphbasedauthornamedisambiguationmethodandanalysisviainformationtheory AT yingyingma graphbasedauthornamedisambiguationmethodandanalysisviainformationtheory AT youlongwu graphbasedauthornamedisambiguationmethodandanalysisviainformationtheory AT chengqianglu graphbasedauthornamedisambiguationmethodandanalysisviainformationtheory |