Optimization of Associative Knowledge Graph using TF-IDF based Ranking Score

This study proposes the optimization method of the associative knowledge graph using TF-IDF based ranking scores. The proposed method calculates TF-IDF weights in all documents and generates term ranking. Based on the terms with high scores from TF-IDF based ranking, optimized transactions are gener...

Full description

Bibliographic Details
Main Authors: Hyun-Jin Kim, Ji-Won Baek, Kyungyong Chung
Format: Article
Language:English
Published: MDPI AG 2020-07-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/10/13/4590
_version_ 1827713754120323072
author Hyun-Jin Kim
Ji-Won Baek
Kyungyong Chung
author_facet Hyun-Jin Kim
Ji-Won Baek
Kyungyong Chung
author_sort Hyun-Jin Kim
collection DOAJ
description This study proposes the optimization method of the associative knowledge graph using TF-IDF based ranking scores. The proposed method calculates TF-IDF weights in all documents and generates term ranking. Based on the terms with high scores from TF-IDF based ranking, optimized transactions are generated. News data are first collected through crawling and then are converted into a corpus through preprocessing. Unnecessary data are removed through preprocessing including lowercase conversion, removal of punctuation marks and stop words. In the document term matrix, words are extracted and then transactions are generated. In the data cleaning process, the Apriori algorithm is applied to generate association rules and make a knowledge graph. To optimize the generated knowledge graph, the proposed method utilizes TF-IDF based ranking scores to remove terms with low scores and recreate transactions. Based on the result, the association rule algorithm is applied to create an optimized knowledge model. The performance is evaluated in rule generation speed and usefulness of association rules. The association rule generation speed of the proposed method is about 22 seconds faster. And the lift value of the proposed method for usefulness is about 0.43 to 2.51 higher than that of each one of conventional association rule algorithms.
first_indexed 2024-03-10T18:43:24Z
format Article
id doaj.art-40a144fa6ced4154bc9b4258a661b401
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-10T18:43:24Z
publishDate 2020-07-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-40a144fa6ced4154bc9b4258a661b4012023-11-20T05:38:15ZengMDPI AGApplied Sciences2076-34172020-07-011013459010.3390/app10134590Optimization of Associative Knowledge Graph using TF-IDF based Ranking ScoreHyun-Jin Kim0Ji-Won Baek1Kyungyong Chung2Division of Computer Science and Engineering, Kyonggi University, Suwon 16227, Gyeonggi, KoreaDepartment of Computer Science, Kyonggi University, Suwon 16227, Gyeonggi, KoreaDivision of Computer Science and Engineering, Kyonggi University, Suwon 16227, Gyeonggi, KoreaThis study proposes the optimization method of the associative knowledge graph using TF-IDF based ranking scores. The proposed method calculates TF-IDF weights in all documents and generates term ranking. Based on the terms with high scores from TF-IDF based ranking, optimized transactions are generated. News data are first collected through crawling and then are converted into a corpus through preprocessing. Unnecessary data are removed through preprocessing including lowercase conversion, removal of punctuation marks and stop words. In the document term matrix, words are extracted and then transactions are generated. In the data cleaning process, the Apriori algorithm is applied to generate association rules and make a knowledge graph. To optimize the generated knowledge graph, the proposed method utilizes TF-IDF based ranking scores to remove terms with low scores and recreate transactions. Based on the result, the association rule algorithm is applied to create an optimized knowledge model. The performance is evaluated in rule generation speed and usefulness of association rules. The association rule generation speed of the proposed method is about 22 seconds faster. And the lift value of the proposed method for usefulness is about 0.43 to 2.51 higher than that of each one of conventional association rule algorithms.https://www.mdpi.com/2076-3417/10/13/4590TF-IDFassociation ruleaprioriFP-treeassociative knowledge graph
spellingShingle Hyun-Jin Kim
Ji-Won Baek
Kyungyong Chung
Optimization of Associative Knowledge Graph using TF-IDF based Ranking Score
Applied Sciences
TF-IDF
association rule
apriori
FP-tree
associative knowledge graph
title Optimization of Associative Knowledge Graph using TF-IDF based Ranking Score
title_full Optimization of Associative Knowledge Graph using TF-IDF based Ranking Score
title_fullStr Optimization of Associative Knowledge Graph using TF-IDF based Ranking Score
title_full_unstemmed Optimization of Associative Knowledge Graph using TF-IDF based Ranking Score
title_short Optimization of Associative Knowledge Graph using TF-IDF based Ranking Score
title_sort optimization of associative knowledge graph using tf idf based ranking score
topic TF-IDF
association rule
apriori
FP-tree
associative knowledge graph
url https://www.mdpi.com/2076-3417/10/13/4590
work_keys_str_mv AT hyunjinkim optimizationofassociativeknowledgegraphusingtfidfbasedrankingscore
AT jiwonbaek optimizationofassociativeknowledgegraphusingtfidfbasedrankingscore
AT kyungyongchung optimizationofassociativeknowledgegraphusingtfidfbasedrankingscore