17 Data Loofah: A web-based app for efficiently identifying erroneous data

OBJECTIVES/GOALS: The goal was to create and deploy an intuitive, easy-to-use tool that clinical investigators can apply to their data to identify erroneous or inconsistent data entries. Investigators can then correct any errors prior to sharing the data with their statistician for analysis. METHODS...

Full description

Bibliographic Details
Main Authors: Jeffrey R. Fine, Sandra L. Taylor
Format: Article
Language:English
Published: Cambridge University Press 2023-04-01
Series:Journal of Clinical and Translational Science
Online Access:https://www.cambridge.org/core/product/identifier/S2059866123001176/type/journal_article
_version_ 1797840488763490304
author Jeffrey R. Fine
Sandra L. Taylor
author_facet Jeffrey R. Fine
Sandra L. Taylor
author_sort Jeffrey R. Fine
collection DOAJ
description OBJECTIVES/GOALS: The goal was to create and deploy an intuitive, easy-to-use tool that clinical investigators can apply to their data to identify erroneous or inconsistent data entries. Investigators can then correct any errors prior to sharing the data with their statistician for analysis. METHODS/STUDY POPULATION: We developed an interactive shiny app, the Data Loofah, using R Studio that researchers or data analysts can use to examine data. After an investigator uploads data, the app reports which variables are numeric or categorical. Means, standard deviation, median, 25th and 75th quantiles, range and number of missing values are reported for numeric variables. Counts and percentages of categorical variables are summarized. Graphical displays further enhance identification of errors. Access to the Data Loofah is through a secure, university-maintained website with access restricted to university personnel. Supporting materials consisting of instructional step-by-step handouts and videos were developed to assist investigators in the use of the app. RESULTS/ANTICIPATED RESULTS: We will integrate use of the Data Loofah into our Clinical and Translational Science Program’s biostatistics consultative practice. Investigators will use the Data Loofah to pre-screen their data prior to sending it to a statistician, identify errors and inconsistencies, and facilitate making necessary corrections. Statisticians will also use the Data Loofah to review data with investigators prior to starting analyses. Through use of this app, investigators are expected to develop a better understanding of their data specifically and more generally about requirements for preparing data for statistical analysis. Most significantly, regular use of the Data Loofah is expected to result in higher quality data and more efficient use of statistician resources due to reduced effort for data cleaning. DISCUSSION/SIGNIFICANCE: Data cleaning is a time-consuming task and finding data errors can be difficult for data analysts not familiar with clinical variables under study. Further, failure to identify data errors can lead to erroneous results. By facilitating identification of data errors by clinical investigators, the Data Loofah will improve and enhance research output.
first_indexed 2024-04-09T16:16:09Z
format Article
id doaj.art-90c66be4e56348078bdf7e09f0ad4411
institution Directory Open Access Journal
issn 2059-8661
language English
last_indexed 2024-04-09T16:16:09Z
publishDate 2023-04-01
publisher Cambridge University Press
record_format Article
series Journal of Clinical and Translational Science
spelling doaj.art-90c66be4e56348078bdf7e09f0ad44112023-04-24T05:55:55ZengCambridge University PressJournal of Clinical and Translational Science2059-86612023-04-0175510.1017/cts.2023.11717 Data Loofah: A web-based app for efficiently identifying erroneous dataJeffrey R. Fine0Sandra L. Taylor1University of California, DavisUniversity of California, DavisOBJECTIVES/GOALS: The goal was to create and deploy an intuitive, easy-to-use tool that clinical investigators can apply to their data to identify erroneous or inconsistent data entries. Investigators can then correct any errors prior to sharing the data with their statistician for analysis. METHODS/STUDY POPULATION: We developed an interactive shiny app, the Data Loofah, using R Studio that researchers or data analysts can use to examine data. After an investigator uploads data, the app reports which variables are numeric or categorical. Means, standard deviation, median, 25th and 75th quantiles, range and number of missing values are reported for numeric variables. Counts and percentages of categorical variables are summarized. Graphical displays further enhance identification of errors. Access to the Data Loofah is through a secure, university-maintained website with access restricted to university personnel. Supporting materials consisting of instructional step-by-step handouts and videos were developed to assist investigators in the use of the app. RESULTS/ANTICIPATED RESULTS: We will integrate use of the Data Loofah into our Clinical and Translational Science Program’s biostatistics consultative practice. Investigators will use the Data Loofah to pre-screen their data prior to sending it to a statistician, identify errors and inconsistencies, and facilitate making necessary corrections. Statisticians will also use the Data Loofah to review data with investigators prior to starting analyses. Through use of this app, investigators are expected to develop a better understanding of their data specifically and more generally about requirements for preparing data for statistical analysis. Most significantly, regular use of the Data Loofah is expected to result in higher quality data and more efficient use of statistician resources due to reduced effort for data cleaning. DISCUSSION/SIGNIFICANCE: Data cleaning is a time-consuming task and finding data errors can be difficult for data analysts not familiar with clinical variables under study. Further, failure to identify data errors can lead to erroneous results. By facilitating identification of data errors by clinical investigators, the Data Loofah will improve and enhance research output.https://www.cambridge.org/core/product/identifier/S2059866123001176/type/journal_article
spellingShingle Jeffrey R. Fine
Sandra L. Taylor
17 Data Loofah: A web-based app for efficiently identifying erroneous data
Journal of Clinical and Translational Science
title 17 Data Loofah: A web-based app for efficiently identifying erroneous data
title_full 17 Data Loofah: A web-based app for efficiently identifying erroneous data
title_fullStr 17 Data Loofah: A web-based app for efficiently identifying erroneous data
title_full_unstemmed 17 Data Loofah: A web-based app for efficiently identifying erroneous data
title_short 17 Data Loofah: A web-based app for efficiently identifying erroneous data
title_sort 17 data loofah a web based app for efficiently identifying erroneous data
url https://www.cambridge.org/core/product/identifier/S2059866123001176/type/journal_article
work_keys_str_mv AT jeffreyrfine 17dataloofahawebbasedappforefficientlyidentifyingerroneousdata
AT sandraltaylor 17dataloofahawebbasedappforefficientlyidentifyingerroneousdata