How to Inspect and Measure Data Quality about Scientific Publications: Use Case of Wikipedia and CRIS Databases

The quality assurance of publication data in collaborative knowledge bases and in current research information systems (CRIS) becomes more and more relevant by the use of freely available spatial information in different application scenarios. When integrating this data into CRIS, it is necessary to...

Full description

Bibliographic Details
Main Authors: Otmane Azeroual, Włodzimierz Lewoniewski
Format: Article
Language:English
Published: MDPI AG 2020-04-01
Series:Algorithms
Subjects:
Online Access:https://www.mdpi.com/1999-4893/13/5/107
_version_ 1797569557268791296
author Otmane Azeroual
Włodzimierz Lewoniewski
author_facet Otmane Azeroual
Włodzimierz Lewoniewski
author_sort Otmane Azeroual
collection DOAJ
description The quality assurance of publication data in collaborative knowledge bases and in current research information systems (CRIS) becomes more and more relevant by the use of freely available spatial information in different application scenarios. When integrating this data into CRIS, it is necessary to be able to recognize and assess their quality. Only then is it possible to compile a result from the available data that fulfills its purpose for the user, namely to deliver reliable data and information. This paper discussed the quality problems of source metadata in Wikipedia and CRIS. Based on real data from over 40 million Wikipedia articles in various languages, we performed preliminary quality analysis of the metadata of scientific publications using a data quality tool. So far, no data quality measurements have been programmed with Python to assess the quality of metadata from scientific publications in Wikipedia and CRIS. With this in mind, we programmed the methods and algorithms as code, but presented it in the form of pseudocode in this paper to measure the quality related to objective data quality dimensions such as completeness, correctness, consistency, and timeliness. This was prepared as a macro service so that the users can use the measurement results with the program code to make a statement about their scientific publications metadata so that the management can rely on high-quality data when making decisions.
first_indexed 2024-03-10T20:13:07Z
format Article
id doaj.art-46e3109ebd72472584af02202640c15d
institution Directory Open Access Journal
issn 1999-4893
language English
last_indexed 2024-03-10T20:13:07Z
publishDate 2020-04-01
publisher MDPI AG
record_format Article
series Algorithms
spelling doaj.art-46e3109ebd72472584af02202640c15d2023-11-19T22:45:36ZengMDPI AGAlgorithms1999-48932020-04-0113510710.3390/a13050107How to Inspect and Measure Data Quality about Scientific Publications: Use Case of Wikipedia and CRIS DatabasesOtmane Azeroual0Włodzimierz Lewoniewski1German Centre for Higher Education Research and Science Studies (DZHW), 10117 Berlin, GermanyDepartment of Information Systems, Poznań University of Economics and Business, 61-875 Poznań, PolandThe quality assurance of publication data in collaborative knowledge bases and in current research information systems (CRIS) becomes more and more relevant by the use of freely available spatial information in different application scenarios. When integrating this data into CRIS, it is necessary to be able to recognize and assess their quality. Only then is it possible to compile a result from the available data that fulfills its purpose for the user, namely to deliver reliable data and information. This paper discussed the quality problems of source metadata in Wikipedia and CRIS. Based on real data from over 40 million Wikipedia articles in various languages, we performed preliminary quality analysis of the metadata of scientific publications using a data quality tool. So far, no data quality measurements have been programmed with Python to assess the quality of metadata from scientific publications in Wikipedia and CRIS. With this in mind, we programmed the methods and algorithms as code, but presented it in the form of pseudocode in this paper to measure the quality related to objective data quality dimensions such as completeness, correctness, consistency, and timeliness. This was prepared as a macro service so that the users can use the measurement results with the program code to make a statement about their scientific publications metadata so that the management can rely on high-quality data when making decisions.https://www.mdpi.com/1999-4893/13/5/107Wikipediacurrent research information systems (CRIS)publications datadata qualityobjective quality dimensionsresearch data processing
spellingShingle Otmane Azeroual
Włodzimierz Lewoniewski
How to Inspect and Measure Data Quality about Scientific Publications: Use Case of Wikipedia and CRIS Databases
Algorithms
Wikipedia
current research information systems (CRIS)
publications data
data quality
objective quality dimensions
research data processing
title How to Inspect and Measure Data Quality about Scientific Publications: Use Case of Wikipedia and CRIS Databases
title_full How to Inspect and Measure Data Quality about Scientific Publications: Use Case of Wikipedia and CRIS Databases
title_fullStr How to Inspect and Measure Data Quality about Scientific Publications: Use Case of Wikipedia and CRIS Databases
title_full_unstemmed How to Inspect and Measure Data Quality about Scientific Publications: Use Case of Wikipedia and CRIS Databases
title_short How to Inspect and Measure Data Quality about Scientific Publications: Use Case of Wikipedia and CRIS Databases
title_sort how to inspect and measure data quality about scientific publications use case of wikipedia and cris databases
topic Wikipedia
current research information systems (CRIS)
publications data
data quality
objective quality dimensions
research data processing
url https://www.mdpi.com/1999-4893/13/5/107
work_keys_str_mv AT otmaneazeroual howtoinspectandmeasuredataqualityaboutscientificpublicationsusecaseofwikipediaandcrisdatabases
AT włodzimierzlewoniewski howtoinspectandmeasuredataqualityaboutscientificpublicationsusecaseofwikipediaandcrisdatabases