AgTC and AgETL: open-source tools to enhance data collection and management for plant science research

Advancements in phenotyping technology have enabled plant science researchers to gather large volumes of information from their experiments, especially those that evaluate multiple genotypes. To fully leverage these complex and often heterogeneous data sets (i.e. those that differ in format and stru...

Full description

Bibliographic Details
Main Authors: Luis Vargas-Rojas, To-Chia Ting, Katherine M. Rainey, Matthew Reynolds, Diane R. Wang
Format: Article
Language:English
Published: Frontiers Media S.A. 2024-02-01
Series:Frontiers in Plant Science
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fpls.2024.1265073/full
_version_ 1827346218215276544
author Luis Vargas-Rojas
To-Chia Ting
Katherine M. Rainey
Matthew Reynolds
Diane R. Wang
author_facet Luis Vargas-Rojas
To-Chia Ting
Katherine M. Rainey
Matthew Reynolds
Diane R. Wang
author_sort Luis Vargas-Rojas
collection DOAJ
description Advancements in phenotyping technology have enabled plant science researchers to gather large volumes of information from their experiments, especially those that evaluate multiple genotypes. To fully leverage these complex and often heterogeneous data sets (i.e. those that differ in format and structure), scientists must invest considerable time in data processing, and data management has emerged as a considerable barrier for downstream application. Here, we propose a pipeline to enhance data collection, processing, and management from plant science studies comprising of two newly developed open-source programs. The first, called AgTC, is a series of programming functions that generates comma-separated values file templates to collect data in a standard format using either a lab-based computer or a mobile device. The second series of functions, AgETL, executes steps for an Extract-Transform-Load (ETL) data integration process where data are extracted from heterogeneously formatted files, transformed to meet standard criteria, and loaded into a database. There, data are stored and can be accessed for data analysis-related processes, including dynamic data visualization through web-based tools. Both AgTC and AgETL are flexible for application across plant science experiments without programming knowledge on the part of the domain scientist, and their functions are executed on Jupyter Notebook, a browser-based interactive development environment. Additionally, all parameters are easily customized from central configuration files written in the human-readable YAML format. Using three experiments from research laboratories in university and non-government organization (NGO) settings as test cases, we demonstrate the utility of AgTC and AgETL to streamline critical steps from data collection to analysis in the plant sciences.
first_indexed 2024-03-07T23:24:56Z
format Article
id doaj.art-96e2051d58cc4dd69040271446233524
institution Directory Open Access Journal
issn 1664-462X
language English
last_indexed 2024-03-07T23:24:56Z
publishDate 2024-02-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Plant Science
spelling doaj.art-96e2051d58cc4dd690402714462335242024-02-21T05:13:37ZengFrontiers Media S.A.Frontiers in Plant Science1664-462X2024-02-011510.3389/fpls.2024.12650731265073AgTC and AgETL: open-source tools to enhance data collection and management for plant science researchLuis Vargas-Rojas0To-Chia Ting1Katherine M. Rainey2Matthew Reynolds3Diane R. Wang4Department of Agronomy, Purdue University, West Lafayette, IN, United StatesDepartment of Agronomy, Purdue University, West Lafayette, IN, United StatesDepartment of Agronomy, Purdue University, West Lafayette, IN, United StatesWheat Physiology Group, International Maize and Wheat Improvement Center (CIMMYT), Texcoco, MexicoDepartment of Agronomy, Purdue University, West Lafayette, IN, United StatesAdvancements in phenotyping technology have enabled plant science researchers to gather large volumes of information from their experiments, especially those that evaluate multiple genotypes. To fully leverage these complex and often heterogeneous data sets (i.e. those that differ in format and structure), scientists must invest considerable time in data processing, and data management has emerged as a considerable barrier for downstream application. Here, we propose a pipeline to enhance data collection, processing, and management from plant science studies comprising of two newly developed open-source programs. The first, called AgTC, is a series of programming functions that generates comma-separated values file templates to collect data in a standard format using either a lab-based computer or a mobile device. The second series of functions, AgETL, executes steps for an Extract-Transform-Load (ETL) data integration process where data are extracted from heterogeneously formatted files, transformed to meet standard criteria, and loaded into a database. There, data are stored and can be accessed for data analysis-related processes, including dynamic data visualization through web-based tools. Both AgTC and AgETL are flexible for application across plant science experiments without programming knowledge on the part of the domain scientist, and their functions are executed on Jupyter Notebook, a browser-based interactive development environment. Additionally, all parameters are easily customized from central configuration files written in the human-readable YAML format. Using three experiments from research laboratories in university and non-government organization (NGO) settings as test cases, we demonstrate the utility of AgTC and AgETL to streamline critical steps from data collection to analysis in the plant sciences.https://www.frontiersin.org/articles/10.3389/fpls.2024.1265073/fulldata pipelineextract-transform-loaddatabasedata aggregationdata processingplant phenotyping
spellingShingle Luis Vargas-Rojas
To-Chia Ting
Katherine M. Rainey
Matthew Reynolds
Diane R. Wang
AgTC and AgETL: open-source tools to enhance data collection and management for plant science research
Frontiers in Plant Science
data pipeline
extract-transform-load
database
data aggregation
data processing
plant phenotyping
title AgTC and AgETL: open-source tools to enhance data collection and management for plant science research
title_full AgTC and AgETL: open-source tools to enhance data collection and management for plant science research
title_fullStr AgTC and AgETL: open-source tools to enhance data collection and management for plant science research
title_full_unstemmed AgTC and AgETL: open-source tools to enhance data collection and management for plant science research
title_short AgTC and AgETL: open-source tools to enhance data collection and management for plant science research
title_sort agtc and agetl open source tools to enhance data collection and management for plant science research
topic data pipeline
extract-transform-load
database
data aggregation
data processing
plant phenotyping
url https://www.frontiersin.org/articles/10.3389/fpls.2024.1265073/full
work_keys_str_mv AT luisvargasrojas agtcandagetlopensourcetoolstoenhancedatacollectionandmanagementforplantscienceresearch
AT tochiating agtcandagetlopensourcetoolstoenhancedatacollectionandmanagementforplantscienceresearch
AT katherinemrainey agtcandagetlopensourcetoolstoenhancedatacollectionandmanagementforplantscienceresearch
AT matthewreynolds agtcandagetlopensourcetoolstoenhancedatacollectionandmanagementforplantscienceresearch
AT dianerwang agtcandagetlopensourcetoolstoenhancedatacollectionandmanagementforplantscienceresearch