A solution and practice for combining multi-source heterogeneous data to construct enterprise knowledge graph

The knowledge graph is one of the essential infrastructures of artificial intelligence. It is a challenge for knowledge engineering to construct a high-quality domain knowledge graph for multi-source heterogeneous data. We propose a complete process framework for constructing a knowledge graph that...

Full description

Bibliographic Details
Main Authors: Chenwei Yan, Xinyue Fang, Xiaotong Huang, Chenyi Guo, Ji Wu
Format: Article
Language:English
Published: Frontiers Media S.A. 2023-09-01
Series:Frontiers in Big Data
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fdata.2023.1278153/full
_version_ 1827804214916546560
author Chenwei Yan
Chenwei Yan
Xinyue Fang
Xiaotong Huang
Xiaotong Huang
Chenyi Guo
Ji Wu
author_facet Chenwei Yan
Chenwei Yan
Xinyue Fang
Xiaotong Huang
Xiaotong Huang
Chenyi Guo
Ji Wu
author_sort Chenwei Yan
collection DOAJ
description The knowledge graph is one of the essential infrastructures of artificial intelligence. It is a challenge for knowledge engineering to construct a high-quality domain knowledge graph for multi-source heterogeneous data. We propose a complete process framework for constructing a knowledge graph that combines structured data and unstructured data, which includes data processing, information extraction, knowledge fusion, data storage, and update strategies, aiming to improve the quality of the knowledge graph and extend its life cycle. Specifically, we take the construction process of an enterprise knowledge graph as an example and integrate enterprise register information, litigation-related information, and enterprise announcement information to enrich the enterprise knowledge graph. For the unstructured text, we improve existing model to extract triples and the F1-score of our model reached 72.77%. The number of nodes and edges in our constructed enterprise knowledge graph reaches 1,430,000 and 3,170,000, respectively. Furthermore, for each type of multi-source heterogeneous data, we apply corresponding methods and strategies for information extraction and data storage and carry out a detailed comparative analysis of graph databases. From the perspective of practical use, the informative enterprise knowledge graph and its timely update can serve many actual business needs. Our proposed enterprise knowledge graph has been deployed in HuaRong RongTong (Beijing) Technology Co., Ltd. and is used by the staff as a powerful tool for corporate due diligence. The key features are reported and analyzed in the case study. Overall, this paper provides an easy-to-follow solution and practice for domain knowledge graph construction, as well as demonstrating its application in corporate due diligence.
first_indexed 2024-03-11T21:07:42Z
format Article
id doaj.art-4d40564507fe40959c9e7e78e7d6bbcb
institution Directory Open Access Journal
issn 2624-909X
language English
last_indexed 2024-03-11T21:07:42Z
publishDate 2023-09-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Big Data
spelling doaj.art-4d40564507fe40959c9e7e78e7d6bbcb2023-09-29T09:17:29ZengFrontiers Media S.A.Frontiers in Big Data2624-909X2023-09-01610.3389/fdata.2023.12781531278153A solution and practice for combining multi-source heterogeneous data to construct enterprise knowledge graphChenwei Yan0Chenwei Yan1Xinyue Fang2Xiaotong Huang3Xiaotong Huang4Chenyi Guo5Ji Wu6School of Computer Science (National Pilot Software Engineering School), Beijing University of Posts and Telecommunications, Beijing, ChinaKey Laboratory of Trustworthy Distributed Computing and Service (BUPT), Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, ChinaSchool of Economics and Management, Tsinghua University, Beijing, ChinaSchool of Computer Science (National Pilot Software Engineering School), Beijing University of Posts and Telecommunications, Beijing, ChinaKey Laboratory of Trustworthy Distributed Computing and Service (BUPT), Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, ChinaDepartment of Electronic Engineering, Tsinghua University, Beijing, ChinaDepartment of Electronic Engineering, Tsinghua University, Beijing, ChinaThe knowledge graph is one of the essential infrastructures of artificial intelligence. It is a challenge for knowledge engineering to construct a high-quality domain knowledge graph for multi-source heterogeneous data. We propose a complete process framework for constructing a knowledge graph that combines structured data and unstructured data, which includes data processing, information extraction, knowledge fusion, data storage, and update strategies, aiming to improve the quality of the knowledge graph and extend its life cycle. Specifically, we take the construction process of an enterprise knowledge graph as an example and integrate enterprise register information, litigation-related information, and enterprise announcement information to enrich the enterprise knowledge graph. For the unstructured text, we improve existing model to extract triples and the F1-score of our model reached 72.77%. The number of nodes and edges in our constructed enterprise knowledge graph reaches 1,430,000 and 3,170,000, respectively. Furthermore, for each type of multi-source heterogeneous data, we apply corresponding methods and strategies for information extraction and data storage and carry out a detailed comparative analysis of graph databases. From the perspective of practical use, the informative enterprise knowledge graph and its timely update can serve many actual business needs. Our proposed enterprise knowledge graph has been deployed in HuaRong RongTong (Beijing) Technology Co., Ltd. and is used by the staff as a powerful tool for corporate due diligence. The key features are reported and analyzed in the case study. Overall, this paper provides an easy-to-follow solution and practice for domain knowledge graph construction, as well as demonstrating its application in corporate due diligence.https://www.frontiersin.org/articles/10.3389/fdata.2023.1278153/fullknowledge graph constructionheterogeneous dataknowledge graph updateenterprise knowledge graphgraph database
spellingShingle Chenwei Yan
Chenwei Yan
Xinyue Fang
Xiaotong Huang
Xiaotong Huang
Chenyi Guo
Ji Wu
A solution and practice for combining multi-source heterogeneous data to construct enterprise knowledge graph
Frontiers in Big Data
knowledge graph construction
heterogeneous data
knowledge graph update
enterprise knowledge graph
graph database
title A solution and practice for combining multi-source heterogeneous data to construct enterprise knowledge graph
title_full A solution and practice for combining multi-source heterogeneous data to construct enterprise knowledge graph
title_fullStr A solution and practice for combining multi-source heterogeneous data to construct enterprise knowledge graph
title_full_unstemmed A solution and practice for combining multi-source heterogeneous data to construct enterprise knowledge graph
title_short A solution and practice for combining multi-source heterogeneous data to construct enterprise knowledge graph
title_sort solution and practice for combining multi source heterogeneous data to construct enterprise knowledge graph
topic knowledge graph construction
heterogeneous data
knowledge graph update
enterprise knowledge graph
graph database
url https://www.frontiersin.org/articles/10.3389/fdata.2023.1278153/full
work_keys_str_mv AT chenweiyan asolutionandpracticeforcombiningmultisourceheterogeneousdatatoconstructenterpriseknowledgegraph
AT chenweiyan asolutionandpracticeforcombiningmultisourceheterogeneousdatatoconstructenterpriseknowledgegraph
AT xinyuefang asolutionandpracticeforcombiningmultisourceheterogeneousdatatoconstructenterpriseknowledgegraph
AT xiaotonghuang asolutionandpracticeforcombiningmultisourceheterogeneousdatatoconstructenterpriseknowledgegraph
AT xiaotonghuang asolutionandpracticeforcombiningmultisourceheterogeneousdatatoconstructenterpriseknowledgegraph
AT chenyiguo asolutionandpracticeforcombiningmultisourceheterogeneousdatatoconstructenterpriseknowledgegraph
AT jiwu asolutionandpracticeforcombiningmultisourceheterogeneousdatatoconstructenterpriseknowledgegraph
AT chenweiyan solutionandpracticeforcombiningmultisourceheterogeneousdatatoconstructenterpriseknowledgegraph
AT chenweiyan solutionandpracticeforcombiningmultisourceheterogeneousdatatoconstructenterpriseknowledgegraph
AT xinyuefang solutionandpracticeforcombiningmultisourceheterogeneousdatatoconstructenterpriseknowledgegraph
AT xiaotonghuang solutionandpracticeforcombiningmultisourceheterogeneousdatatoconstructenterpriseknowledgegraph
AT xiaotonghuang solutionandpracticeforcombiningmultisourceheterogeneousdatatoconstructenterpriseknowledgegraph
AT chenyiguo solutionandpracticeforcombiningmultisourceheterogeneousdatatoconstructenterpriseknowledgegraph
AT jiwu solutionandpracticeforcombiningmultisourceheterogeneousdatatoconstructenterpriseknowledgegraph