Y-Rank: A Multi-Feature-Based Keyphrase Extraction Method for Short Text

Keyphrase extraction is a critical task in text information retrieval, which traditionally employs both supervised and unsupervised approaches. Supervised methods generally rely on large corpora, which introduce the problems of availability, while unsupervised methods are independent of out-sources...

Full description

Bibliographic Details
Main Authors: Qiang Liu, Yan Hui, Shangdong Liu, Yimu Ji
Format: Article
Language:English
Published: MDPI AG 2024-03-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/14/6/2510
_version_ 1797242175765872640
author Qiang Liu
Yan Hui
Shangdong Liu
Yimu Ji
author_facet Qiang Liu
Yan Hui
Shangdong Liu
Yimu Ji
author_sort Qiang Liu
collection DOAJ
description Keyphrase extraction is a critical task in text information retrieval, which traditionally employs both supervised and unsupervised approaches. Supervised methods generally rely on large corpora, which introduce the problems of availability, while unsupervised methods are independent of out-sources but also lead to defects like imperfect statistical features or low accuracy. Particularly in short-text scenarios, limited text features often result in low-quality candidate ranking. To address this issue, this paper proposes Y-Rank, a lightweight unsupervised keyphrase extraction method that extracts the average information content of candidate sentences as the key statistical features from a single document, and follows a graph construction approach based on similarity to obtain the semantic features of keyphrase with high-quality and ranking accuracy. Finally, the top-ranked keyphrases are acquired by the fusion of these features. The experimental results on five datasets illustrate that Y-Rank outperforms the other nine unsupervised methods, achieves enhancements on six accuracy metrics, including Precision, Recall, F-Measure, MRR, MAP, and Bpref, and performs the highest improvement in short text scenarios.
first_indexed 2024-04-24T18:35:03Z
format Article
id doaj.art-e9c4db473a03478f9db8e352a52c61bf
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-04-24T18:35:03Z
publishDate 2024-03-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-e9c4db473a03478f9db8e352a52c61bf2024-03-27T13:19:57ZengMDPI AGApplied Sciences2076-34172024-03-01146251010.3390/app14062510Y-Rank: A Multi-Feature-Based Keyphrase Extraction Method for Short TextQiang Liu0Yan Hui1Shangdong Liu2Yimu Ji3School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, ChinaSchool of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, ChinaSchool of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, ChinaSchool of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, ChinaKeyphrase extraction is a critical task in text information retrieval, which traditionally employs both supervised and unsupervised approaches. Supervised methods generally rely on large corpora, which introduce the problems of availability, while unsupervised methods are independent of out-sources but also lead to defects like imperfect statistical features or low accuracy. Particularly in short-text scenarios, limited text features often result in low-quality candidate ranking. To address this issue, this paper proposes Y-Rank, a lightweight unsupervised keyphrase extraction method that extracts the average information content of candidate sentences as the key statistical features from a single document, and follows a graph construction approach based on similarity to obtain the semantic features of keyphrase with high-quality and ranking accuracy. Finally, the top-ranked keyphrases are acquired by the fusion of these features. The experimental results on five datasets illustrate that Y-Rank outperforms the other nine unsupervised methods, achieves enhancements on six accuracy metrics, including Precision, Recall, F-Measure, MRR, MAP, and Bpref, and performs the highest improvement in short text scenarios.https://www.mdpi.com/2076-3417/14/6/2510keyphrase extractionword embeddingunsupervised methodphrase qualityinformation retrievalnatural language processing
spellingShingle Qiang Liu
Yan Hui
Shangdong Liu
Yimu Ji
Y-Rank: A Multi-Feature-Based Keyphrase Extraction Method for Short Text
Applied Sciences
keyphrase extraction
word embedding
unsupervised method
phrase quality
information retrieval
natural language processing
title Y-Rank: A Multi-Feature-Based Keyphrase Extraction Method for Short Text
title_full Y-Rank: A Multi-Feature-Based Keyphrase Extraction Method for Short Text
title_fullStr Y-Rank: A Multi-Feature-Based Keyphrase Extraction Method for Short Text
title_full_unstemmed Y-Rank: A Multi-Feature-Based Keyphrase Extraction Method for Short Text
title_short Y-Rank: A Multi-Feature-Based Keyphrase Extraction Method for Short Text
title_sort y rank a multi feature based keyphrase extraction method for short text
topic keyphrase extraction
word embedding
unsupervised method
phrase quality
information retrieval
natural language processing
url https://www.mdpi.com/2076-3417/14/6/2510
work_keys_str_mv AT qiangliu yrankamultifeaturebasedkeyphraseextractionmethodforshorttext
AT yanhui yrankamultifeaturebasedkeyphraseextractionmethodforshorttext
AT shangdongliu yrankamultifeaturebasedkeyphraseextractionmethodforshorttext
AT yimuji yrankamultifeaturebasedkeyphraseextractionmethodforshorttext