An Extended AHP-Based Corpus Assessment Approach for Handling Keyword Ranking of NLP: An Example of COVID-19 Corpus Data

The use of corpus assessment approaches to determine and rank keywords for corpus data is critical due to the issues of information retrieval (IR) in Natural Language Processing (NLP), such as when encountering COVID-19, as it can determine whether people can rapidly obtain knowledge of the disease....

Full description

Bibliographic Details
Main Authors: Liang-Ching Chen, Kuei-Hu Chang
Format: Article
Language:English
Published: MDPI AG 2023-07-01
Series:Axioms
Subjects:
Online Access:https://www.mdpi.com/2075-1680/12/8/740
_version_ 1797585559119462400
author Liang-Ching Chen
Kuei-Hu Chang
author_facet Liang-Ching Chen
Kuei-Hu Chang
author_sort Liang-Ching Chen
collection DOAJ
description The use of corpus assessment approaches to determine and rank keywords for corpus data is critical due to the issues of information retrieval (IR) in Natural Language Processing (NLP), such as when encountering COVID-19, as it can determine whether people can rapidly obtain knowledge of the disease. The algorithms used for corpus assessment have to consider multiple parameters and integrate individuals’ subjective evaluation information simultaneously to meet real-world needs. However, traditional keyword-list-generating approaches are based on only one parameter (i.e., the keyness value) to determine and rank keywords, which is insufficient. To improve the evaluation benefit of the traditional keyword-list-generating approach, this paper proposed an extended analytic hierarchy process (AHP)-based corpus assessment approach to, firstly, refine the corpus data and then use the AHP method to compute the relative weights of three parameters (keyness, frequency, and range). To verify the proposed approach, this paper adopted 53 COVID-19-related research environmental science research articles from the Web of Science (WOS) as an empirical example. After comparing with the traditional keyword-list-generating approach and the equal weights (EW) method, the significant contributions are: (1) using the machine-based technique to remove function and meaningless words for optimizing the corpus data; (2) being able to consider multiple parameters simultaneously; and (3) being able to integrate the experts’ evaluation results to determine the relative weights of the parameters.
first_indexed 2024-03-11T00:07:51Z
format Article
id doaj.art-caa8550c5c2b4be38f52be55e7a30fb4
institution Directory Open Access Journal
issn 2075-1680
language English
last_indexed 2024-03-11T00:07:51Z
publishDate 2023-07-01
publisher MDPI AG
record_format Article
series Axioms
spelling doaj.art-caa8550c5c2b4be38f52be55e7a30fb42023-11-19T00:14:30ZengMDPI AGAxioms2075-16802023-07-0112874010.3390/axioms12080740An Extended AHP-Based Corpus Assessment Approach for Handling Keyword Ranking of NLP: An Example of COVID-19 Corpus DataLiang-Ching Chen0Kuei-Hu Chang1Department of Foreign Languages, R.O.C. Military Academy, Kaohsiung 830, TaiwanDepartment of Management Sciences, R.O.C. Military Academy, Kaohsiung 830, TaiwanThe use of corpus assessment approaches to determine and rank keywords for corpus data is critical due to the issues of information retrieval (IR) in Natural Language Processing (NLP), such as when encountering COVID-19, as it can determine whether people can rapidly obtain knowledge of the disease. The algorithms used for corpus assessment have to consider multiple parameters and integrate individuals’ subjective evaluation information simultaneously to meet real-world needs. However, traditional keyword-list-generating approaches are based on only one parameter (i.e., the keyness value) to determine and rank keywords, which is insufficient. To improve the evaluation benefit of the traditional keyword-list-generating approach, this paper proposed an extended analytic hierarchy process (AHP)-based corpus assessment approach to, firstly, refine the corpus data and then use the AHP method to compute the relative weights of three parameters (keyness, frequency, and range). To verify the proposed approach, this paper adopted 53 COVID-19-related research environmental science research articles from the Web of Science (WOS) as an empirical example. After comparing with the traditional keyword-list-generating approach and the equal weights (EW) method, the significant contributions are: (1) using the machine-based technique to remove function and meaningless words for optimizing the corpus data; (2) being able to consider multiple parameters simultaneously; and (3) being able to integrate the experts’ evaluation results to determine the relative weights of the parameters.https://www.mdpi.com/2075-1680/12/8/740corpus assessment approachnatural language processing (NLP)COVID-19analytic hierarchy process (AHP)environmental science
spellingShingle Liang-Ching Chen
Kuei-Hu Chang
An Extended AHP-Based Corpus Assessment Approach for Handling Keyword Ranking of NLP: An Example of COVID-19 Corpus Data
Axioms
corpus assessment approach
natural language processing (NLP)
COVID-19
analytic hierarchy process (AHP)
environmental science
title An Extended AHP-Based Corpus Assessment Approach for Handling Keyword Ranking of NLP: An Example of COVID-19 Corpus Data
title_full An Extended AHP-Based Corpus Assessment Approach for Handling Keyword Ranking of NLP: An Example of COVID-19 Corpus Data
title_fullStr An Extended AHP-Based Corpus Assessment Approach for Handling Keyword Ranking of NLP: An Example of COVID-19 Corpus Data
title_full_unstemmed An Extended AHP-Based Corpus Assessment Approach for Handling Keyword Ranking of NLP: An Example of COVID-19 Corpus Data
title_short An Extended AHP-Based Corpus Assessment Approach for Handling Keyword Ranking of NLP: An Example of COVID-19 Corpus Data
title_sort extended ahp based corpus assessment approach for handling keyword ranking of nlp an example of covid 19 corpus data
topic corpus assessment approach
natural language processing (NLP)
COVID-19
analytic hierarchy process (AHP)
environmental science
url https://www.mdpi.com/2075-1680/12/8/740
work_keys_str_mv AT liangchingchen anextendedahpbasedcorpusassessmentapproachforhandlingkeywordrankingofnlpanexampleofcovid19corpusdata
AT kueihuchang anextendedahpbasedcorpusassessmentapproachforhandlingkeywordrankingofnlpanexampleofcovid19corpusdata
AT liangchingchen extendedahpbasedcorpusassessmentapproachforhandlingkeywordrankingofnlpanexampleofcovid19corpusdata
AT kueihuchang extendedahpbasedcorpusassessmentapproachforhandlingkeywordrankingofnlpanexampleofcovid19corpusdata