Data Validation Infrastructure for R
Checking data quality against domain knowledge is a common activity that pervades statistical analysis from raw data to output. The R package validate facilitates this task by capturing and applying expert knowledge in the form of validation rules: logical restrictions on variables, records, or data...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Foundation for Open Access Statistics
2021-03-01
|
Series: | Journal of Statistical Software |
Subjects: | |
Online Access: | https://www.jstatsoft.org/index.php/jss/article/view/3483 |
_version_ | 1797813947917664256 |
---|---|
author | Mark P. J. van der Loo Edwin de Jonge |
author_facet | Mark P. J. van der Loo Edwin de Jonge |
author_sort | Mark P. J. van der Loo |
collection | DOAJ |
description | Checking data quality against domain knowledge is a common activity that pervades statistical analysis from raw data to output. The R package validate facilitates this task by capturing and applying expert knowledge in the form of validation rules: logical restrictions on variables, records, or data sets that should be satisfied before they are considered valid input for further analysis. In the validate package, validation rules are objects of computation that can be manipulated, investigated, and confronted with data or versions of a data set. The results of a confrontation are then available for further investigation, summarization or visualization. Validation rules can also be endowed with metadata and documentation and they may be stored or retrieved from external sources such as text files or tabular formats. This data validation infrastructure thus allows for systematic, user-defined definition of data quality requirements that can be reused for various versions of a data set or by data correction algorithms that are parameterized by validation rules. |
first_indexed | 2024-03-13T08:00:16Z |
format | Article |
id | doaj.art-91379f4791884e50b2dc2da72268f8bc |
institution | Directory Open Access Journal |
issn | 1548-7660 |
language | English |
last_indexed | 2024-03-13T08:00:16Z |
publishDate | 2021-03-01 |
publisher | Foundation for Open Access Statistics |
record_format | Article |
series | Journal of Statistical Software |
spelling | doaj.art-91379f4791884e50b2dc2da72268f8bc2023-06-01T18:41:10ZengFoundation for Open Access StatisticsJournal of Statistical Software1548-76602021-03-0197110.18637/jss.v097.i103332Data Validation Infrastructure for RMark P. J. van der LooEdwin de JongeChecking data quality against domain knowledge is a common activity that pervades statistical analysis from raw data to output. The R package validate facilitates this task by capturing and applying expert knowledge in the form of validation rules: logical restrictions on variables, records, or data sets that should be satisfied before they are considered valid input for further analysis. In the validate package, validation rules are objects of computation that can be manipulated, investigated, and confronted with data or versions of a data set. The results of a confrontation are then available for further investigation, summarization or visualization. Validation rules can also be endowed with metadata and documentation and they may be stored or retrieved from external sources such as text files or tabular formats. This data validation infrastructure thus allows for systematic, user-defined definition of data quality requirements that can be reused for various versions of a data set or by data correction algorithms that are parameterized by validation rules.https://www.jstatsoft.org/index.php/jss/article/view/3483data checkingdata qualitydata cleaningR |
spellingShingle | Mark P. J. van der Loo Edwin de Jonge Data Validation Infrastructure for R Journal of Statistical Software data checking data quality data cleaning R |
title | Data Validation Infrastructure for R |
title_full | Data Validation Infrastructure for R |
title_fullStr | Data Validation Infrastructure for R |
title_full_unstemmed | Data Validation Infrastructure for R |
title_short | Data Validation Infrastructure for R |
title_sort | data validation infrastructure for r |
topic | data checking data quality data cleaning R |
url | https://www.jstatsoft.org/index.php/jss/article/view/3483 |
work_keys_str_mv | AT markpjvanderloo datavalidationinfrastructureforr AT edwindejonge datavalidationinfrastructureforr |