Y-Rank: A Multi-Feature-Based Keyphrase Extraction Method for Short Text
Keyphrase extraction is a critical task in text information retrieval, which traditionally employs both supervised and unsupervised approaches. Supervised methods generally rely on large corpora, which introduce the problems of availability, while unsupervised methods are independent of out-sources...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2024-03-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/14/6/2510 |
_version_ | 1797242175765872640 |
---|---|
author | Qiang Liu Yan Hui Shangdong Liu Yimu Ji |
author_facet | Qiang Liu Yan Hui Shangdong Liu Yimu Ji |
author_sort | Qiang Liu |
collection | DOAJ |
description | Keyphrase extraction is a critical task in text information retrieval, which traditionally employs both supervised and unsupervised approaches. Supervised methods generally rely on large corpora, which introduce the problems of availability, while unsupervised methods are independent of out-sources but also lead to defects like imperfect statistical features or low accuracy. Particularly in short-text scenarios, limited text features often result in low-quality candidate ranking. To address this issue, this paper proposes Y-Rank, a lightweight unsupervised keyphrase extraction method that extracts the average information content of candidate sentences as the key statistical features from a single document, and follows a graph construction approach based on similarity to obtain the semantic features of keyphrase with high-quality and ranking accuracy. Finally, the top-ranked keyphrases are acquired by the fusion of these features. The experimental results on five datasets illustrate that Y-Rank outperforms the other nine unsupervised methods, achieves enhancements on six accuracy metrics, including Precision, Recall, F-Measure, MRR, MAP, and Bpref, and performs the highest improvement in short text scenarios. |
first_indexed | 2024-04-24T18:35:03Z |
format | Article |
id | doaj.art-e9c4db473a03478f9db8e352a52c61bf |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-04-24T18:35:03Z |
publishDate | 2024-03-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-e9c4db473a03478f9db8e352a52c61bf2024-03-27T13:19:57ZengMDPI AGApplied Sciences2076-34172024-03-01146251010.3390/app14062510Y-Rank: A Multi-Feature-Based Keyphrase Extraction Method for Short TextQiang Liu0Yan Hui1Shangdong Liu2Yimu Ji3School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, ChinaSchool of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, ChinaSchool of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, ChinaSchool of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, ChinaKeyphrase extraction is a critical task in text information retrieval, which traditionally employs both supervised and unsupervised approaches. Supervised methods generally rely on large corpora, which introduce the problems of availability, while unsupervised methods are independent of out-sources but also lead to defects like imperfect statistical features or low accuracy. Particularly in short-text scenarios, limited text features often result in low-quality candidate ranking. To address this issue, this paper proposes Y-Rank, a lightweight unsupervised keyphrase extraction method that extracts the average information content of candidate sentences as the key statistical features from a single document, and follows a graph construction approach based on similarity to obtain the semantic features of keyphrase with high-quality and ranking accuracy. Finally, the top-ranked keyphrases are acquired by the fusion of these features. The experimental results on five datasets illustrate that Y-Rank outperforms the other nine unsupervised methods, achieves enhancements on six accuracy metrics, including Precision, Recall, F-Measure, MRR, MAP, and Bpref, and performs the highest improvement in short text scenarios.https://www.mdpi.com/2076-3417/14/6/2510keyphrase extractionword embeddingunsupervised methodphrase qualityinformation retrievalnatural language processing |
spellingShingle | Qiang Liu Yan Hui Shangdong Liu Yimu Ji Y-Rank: A Multi-Feature-Based Keyphrase Extraction Method for Short Text Applied Sciences keyphrase extraction word embedding unsupervised method phrase quality information retrieval natural language processing |
title | Y-Rank: A Multi-Feature-Based Keyphrase Extraction Method for Short Text |
title_full | Y-Rank: A Multi-Feature-Based Keyphrase Extraction Method for Short Text |
title_fullStr | Y-Rank: A Multi-Feature-Based Keyphrase Extraction Method for Short Text |
title_full_unstemmed | Y-Rank: A Multi-Feature-Based Keyphrase Extraction Method for Short Text |
title_short | Y-Rank: A Multi-Feature-Based Keyphrase Extraction Method for Short Text |
title_sort | y rank a multi feature based keyphrase extraction method for short text |
topic | keyphrase extraction word embedding unsupervised method phrase quality information retrieval natural language processing |
url | https://www.mdpi.com/2076-3417/14/6/2510 |
work_keys_str_mv | AT qiangliu yrankamultifeaturebasedkeyphraseextractionmethodforshorttext AT yanhui yrankamultifeaturebasedkeyphraseextractionmethodforshorttext AT shangdongliu yrankamultifeaturebasedkeyphraseextractionmethodforshorttext AT yimuji yrankamultifeaturebasedkeyphraseextractionmethodforshorttext |