A comparative study of keyword extraction algorithms for English texts

This study mainly analyzed the keyword extraction of English text. First, two commonly used algorithms, the term frequency–inverse document frequency (TF–IDF) algorithm and the keyphrase extraction algorithm (KEA), were introduced. Then, an improved TF–IDF algorithm was designed, which improved the...

Full description

Bibliographic Details
Main Author:	Li Jinye
Format:	Article
Language:	English
Published:	De Gruyter 2021-07-01
Series:	Journal of Intelligent Systems
Subjects:	english text keyword extraction tf–idf algorithm kea
Online Access:	https://doi.org/10.1515/jisys-2021-0040

_version_	1828123756360368128
author	Li Jinye
author_facet	Li Jinye
author_sort	Li Jinye
collection	DOAJ
description	This study mainly analyzed the keyword extraction of English text. First, two commonly used algorithms, the term frequency–inverse document frequency (TF–IDF) algorithm and the keyphrase extraction algorithm (KEA), were introduced. Then, an improved TF–IDF algorithm was designed, which improved the calculation of word frequency, and it was combined with the position weight to improve the performance of keyword extraction. Finally, 100 English literature was selected from the British Academic Written English Corpus for the analysis experiment. The results showed that the improved TF–IDF algorithm had the shortest running time and took only 4.93 s in processing 100 texts; the precision of the algorithms decreased with the increase of the number of extracted keywords. The comparison between the two algorithms demonstrated that the improved TF–IDF algorithm had the best performance, with a precision rate of 71.2%, a recall rate of 52.98%, and an F 1 score of 60.75%, when five keywords were extracted from each article. The experimental results show that the improved TF–IDF algorithm is effective in extracting English text keywords, which can be further promoted and applied in practice.
first_indexed	2024-04-11T14:57:31Z
format	Article
id	doaj.art-b4bc37e00cf5454ebd2972800844d3fc
institution	Directory Open Access Journal
issn	2191-026X
language	English
last_indexed	2024-04-11T14:57:31Z
publishDate	2021-07-01
publisher	De Gruyter
record_format	Article
series	Journal of Intelligent Systems
spelling	doaj.art-b4bc37e00cf5454ebd2972800844d3fc2022-12-22T04:17:11ZengDe GruyterJournal of Intelligent Systems2191-026X2021-07-0130180881510.1515/jisys-2021-0040A comparative study of keyword extraction algorithms for English textsLi Jinye0Department of Applied Foreign Languages, Dongguan Polytechnic, No. 3, Daxue Road, Dongguan, Guangdong 523808, ChinaThis study mainly analyzed the keyword extraction of English text. First, two commonly used algorithms, the term frequency–inverse document frequency (TF–IDF) algorithm and the keyphrase extraction algorithm (KEA), were introduced. Then, an improved TF–IDF algorithm was designed, which improved the calculation of word frequency, and it was combined with the position weight to improve the performance of keyword extraction. Finally, 100 English literature was selected from the British Academic Written English Corpus for the analysis experiment. The results showed that the improved TF–IDF algorithm had the shortest running time and took only 4.93 s in processing 100 texts; the precision of the algorithms decreased with the increase of the number of extracted keywords. The comparison between the two algorithms demonstrated that the improved TF–IDF algorithm had the best performance, with a precision rate of 71.2%, a recall rate of 52.98%, and an F 1 score of 60.75%, when five keywords were extracted from each article. The experimental results show that the improved TF–IDF algorithm is effective in extracting English text keywords, which can be further promoted and applied in practice.https://doi.org/10.1515/jisys-2021-0040english textkeyword extractiontf–idf algorithmkea
spellingShingle	Li Jinye A comparative study of keyword extraction algorithms for English texts Journal of Intelligent Systems english text keyword extraction tf–idf algorithm kea
title	A comparative study of keyword extraction algorithms for English texts
title_full	A comparative study of keyword extraction algorithms for English texts
title_fullStr	A comparative study of keyword extraction algorithms for English texts
title_full_unstemmed	A comparative study of keyword extraction algorithms for English texts
title_short	A comparative study of keyword extraction algorithms for English texts
title_sort	comparative study of keyword extraction algorithms for english texts
topic	english text keyword extraction tf–idf algorithm kea
url	https://doi.org/10.1515/jisys-2021-0040
work_keys_str_mv	AT lijinye acomparativestudyofkeywordextractionalgorithmsforenglishtexts AT lijinye comparativestudyofkeywordextractionalgorithmsforenglishtexts

A comparative study of keyword extraction algorithms for English texts

Similar Items