An open dataset of data lineage graphs for data governance research

Data have become valuable assets for enterprises. Data governance aims to manage and reuse data assets, facilitating enterprise management and enabling product innovations. A data lineage graph (DLG) is an abstracted collection of data assets and their data lineages in data governance. Analyzing DLG...

Full description

Bibliographic Details
Main Authors: Yunpeng Chen, Ying Zhao, Xuanjing Li, Jiang Zhang, Jiang Long, Fangfang Zhou
Format: Article
Language:English
Published: Elsevier 2024-03-01
Series:Visual Informatics
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2468502X24000020
_version_ 1797234356739112960
author Yunpeng Chen
Ying Zhao
Xuanjing Li
Jiang Zhang
Jiang Long
Fangfang Zhou
author_facet Yunpeng Chen
Ying Zhao
Xuanjing Li
Jiang Zhang
Jiang Long
Fangfang Zhou
author_sort Yunpeng Chen
collection DOAJ
description Data have become valuable assets for enterprises. Data governance aims to manage and reuse data assets, facilitating enterprise management and enabling product innovations. A data lineage graph (DLG) is an abstracted collection of data assets and their data lineages in data governance. Analyzing DLGs can provide rich data insights for data governance. However, the progress of data governance technologies is hindered by the shortage of available open datasets for DLGs. This paper introduces an open dataset of DLGs, including the DLG model, the dataset construction process, and applied areas. This real-world dataset is sourced from Huawei Cloud Computing Technology Company Limited, which contains 18 DLGs with three types of data assets and two types of relations. To the best of our knowledge, this dataset is the first open dataset of DLGs for data governance. This dataset can also support the development of other application areas, such as graph analytics and visualization.
first_indexed 2024-04-24T16:30:46Z
format Article
id doaj.art-b214f3b39b5b48dc8caa500b0bdbd341
institution Directory Open Access Journal
issn 2468-502X
language English
last_indexed 2024-04-24T16:30:46Z
publishDate 2024-03-01
publisher Elsevier
record_format Article
series Visual Informatics
spelling doaj.art-b214f3b39b5b48dc8caa500b0bdbd3412024-03-30T04:39:47ZengElsevierVisual Informatics2468-502X2024-03-018115An open dataset of data lineage graphs for data governance researchYunpeng Chen0Ying Zhao1Xuanjing Li2Jiang Zhang3Jiang Long4Fangfang Zhou5Central South University, Changsha, 410083, Hunan, ChinaCentral South University, Changsha, 410083, Hunan, ChinaCentral South University, Changsha, 410083, Hunan, ChinaHuawei Cloud Computing Technology Co., Ltd., Hangzhou, 310000, Zhejiang, ChinaHuawei Cloud Computing Technology Co., Ltd., Hangzhou, 310000, Zhejiang, ChinaCentral South University, Changsha, 410083, Hunan, China; Corresponding author.Data have become valuable assets for enterprises. Data governance aims to manage and reuse data assets, facilitating enterprise management and enabling product innovations. A data lineage graph (DLG) is an abstracted collection of data assets and their data lineages in data governance. Analyzing DLGs can provide rich data insights for data governance. However, the progress of data governance technologies is hindered by the shortage of available open datasets for DLGs. This paper introduces an open dataset of DLGs, including the DLG model, the dataset construction process, and applied areas. This real-world dataset is sourced from Huawei Cloud Computing Technology Company Limited, which contains 18 DLGs with three types of data assets and two types of relations. To the best of our knowledge, this dataset is the first open dataset of DLGs for data governance. This dataset can also support the development of other application areas, such as graph analytics and visualization.http://www.sciencedirect.com/science/article/pii/S2468502X24000020Data assetData governanceData lineageGraphOpen dataset
spellingShingle Yunpeng Chen
Ying Zhao
Xuanjing Li
Jiang Zhang
Jiang Long
Fangfang Zhou
An open dataset of data lineage graphs for data governance research
Visual Informatics
Data asset
Data governance
Data lineage
Graph
Open dataset
title An open dataset of data lineage graphs for data governance research
title_full An open dataset of data lineage graphs for data governance research
title_fullStr An open dataset of data lineage graphs for data governance research
title_full_unstemmed An open dataset of data lineage graphs for data governance research
title_short An open dataset of data lineage graphs for data governance research
title_sort open dataset of data lineage graphs for data governance research
topic Data asset
Data governance
Data lineage
Graph
Open dataset
url http://www.sciencedirect.com/science/article/pii/S2468502X24000020
work_keys_str_mv AT yunpengchen anopendatasetofdatalineagegraphsfordatagovernanceresearch
AT yingzhao anopendatasetofdatalineagegraphsfordatagovernanceresearch
AT xuanjingli anopendatasetofdatalineagegraphsfordatagovernanceresearch
AT jiangzhang anopendatasetofdatalineagegraphsfordatagovernanceresearch
AT jianglong anopendatasetofdatalineagegraphsfordatagovernanceresearch
AT fangfangzhou anopendatasetofdatalineagegraphsfordatagovernanceresearch
AT yunpengchen opendatasetofdatalineagegraphsfordatagovernanceresearch
AT yingzhao opendatasetofdatalineagegraphsfordatagovernanceresearch
AT xuanjingli opendatasetofdatalineagegraphsfordatagovernanceresearch
AT jiangzhang opendatasetofdatalineagegraphsfordatagovernanceresearch
AT jianglong opendatasetofdatalineagegraphsfordatagovernanceresearch
AT fangfangzhou opendatasetofdatalineagegraphsfordatagovernanceresearch