Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research

Research in environmental health is becoming increasingly reliant upon data science and computational methods that can more efficiently extract information from complex datasets. Data science and computational methods can be leveraged to better identify relationships between exposures to stressors i...

Full description

Bibliographic Details
Main Authors: Kyle Roell, Lauren E. Koval, Rebecca Boyles, Grace Patlewicz, Caroline Ring, Cynthia V. Rider, Cavin Ward-Caviness, David M. Reif, Ilona Jaspers, Rebecca C. Fry, Julia E. Rager
Format: Article
Language:English
Published: Frontiers Media S.A. 2022-06-01
Series:Frontiers in Toxicology
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/ftox.2022.893924/full
_version_ 1811343981985923072
author Kyle Roell
Lauren E. Koval
Lauren E. Koval
Rebecca Boyles
Grace Patlewicz
Caroline Ring
Cynthia V. Rider
Cavin Ward-Caviness
David M. Reif
Ilona Jaspers
Ilona Jaspers
Ilona Jaspers
Ilona Jaspers
Ilona Jaspers
Rebecca C. Fry
Rebecca C. Fry
Rebecca C. Fry
Julia E. Rager
Julia E. Rager
Julia E. Rager
Julia E. Rager
author_facet Kyle Roell
Lauren E. Koval
Lauren E. Koval
Rebecca Boyles
Grace Patlewicz
Caroline Ring
Cynthia V. Rider
Cavin Ward-Caviness
David M. Reif
Ilona Jaspers
Ilona Jaspers
Ilona Jaspers
Ilona Jaspers
Ilona Jaspers
Rebecca C. Fry
Rebecca C. Fry
Rebecca C. Fry
Julia E. Rager
Julia E. Rager
Julia E. Rager
Julia E. Rager
author_sort Kyle Roell
collection DOAJ
description Research in environmental health is becoming increasingly reliant upon data science and computational methods that can more efficiently extract information from complex datasets. Data science and computational methods can be leveraged to better identify relationships between exposures to stressors in the environment and human disease outcomes, representing critical information needed to protect and improve global public health. Still, there remains a critical gap surrounding the training of researchers on these in silico methods. We aimed to address this gap by developing the inTelligence And Machine lEarning (TAME) Toolkit, promoting trainee-driven data generation, management, and analysis methods to “TAME” data in environmental health studies. Training modules were developed to provide applications-driven examples of data organization and analysis methods that can be used to address environmental health questions. Target audiences for these modules include students, post-baccalaureate and post-doctorate trainees, and professionals that are interested in expanding their skillset to include recent advances in data analysis methods relevant to environmental health, toxicology, exposure science, epidemiology, and bioinformatics/cheminformatics. Modules were developed by study coauthors using annotated script and were organized into three chapters within a GitHub Bookdown site. The first chapter of modules focuses on introductory data science, which includes the following topics: setting up R/RStudio and coding in the R environment; data organization basics; finding and visualizing data trends; high-dimensional data visualizations; and Findability, Accessibility, Interoperability, and Reusability (FAIR) data management practices. The second chapter of modules incorporates chemical-biological analyses and predictive modeling, spanning the following methods: dose-response modeling; machine learning and predictive modeling; mixtures analyses; -omics analyses; toxicokinetic modeling; and read-across toxicity predictions. The last chapter of modules was organized to provide examples on environmental health database mining and integration, including chemical exposure, health outcome, and environmental justice indicators. Training modules and associated data are publicly available online (https://uncsrp.github.io/Data-Analysis-Training-Modules/). Together, this resource provides unique opportunities to obtain introductory-level training on current data analysis methods applicable to 21st century science and environmental health.
first_indexed 2024-04-13T19:39:43Z
format Article
id doaj.art-1eb5b45b3f9746218e2f2940b506c6cc
institution Directory Open Access Journal
issn 2673-3080
language English
last_indexed 2024-04-13T19:39:43Z
publishDate 2022-06-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Toxicology
spelling doaj.art-1eb5b45b3f9746218e2f2940b506c6cc2022-12-22T02:32:56ZengFrontiers Media S.A.Frontiers in Toxicology2673-30802022-06-01410.3389/ftox.2022.893924893924Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health ResearchKyle Roell0Lauren E. Koval1Lauren E. Koval2Rebecca Boyles3Grace Patlewicz4Caroline Ring5Cynthia V. Rider6Cavin Ward-Caviness7David M. Reif8Ilona Jaspers9Ilona Jaspers10Ilona Jaspers11Ilona Jaspers12Ilona Jaspers13Rebecca C. Fry14Rebecca C. Fry15Rebecca C. Fry16Julia E. Rager17Julia E. Rager18Julia E. Rager19Julia E. Rager20The Institute for Environmental Health Solutions, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, NC, United StatesThe Institute for Environmental Health Solutions, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, NC, United StatesDepartment of Environmental Sciences and Engineering, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, NC, United StatesResearch Computing, RTI International, Durham, NC, United StatesCenter for Computational Toxicology and Exposure, US Environmental Protection Agency, Durham, NC, United StatesCenter for Computational Toxicology and Exposure, US Environmental Protection Agency, Durham, NC, United StatesDivision of the National Toxicology Program, National Institute of Environmental Health Sciences, Durham, NC, United StatesCenter for Public Health and Environmental Assessment, US Environmental Protection Agency, Chapel Hill, NC, United StatesBioinformatics Research Center, Department of Biological Sciences, North Carolina State University, Raleigh, NC, United StatesThe Institute for Environmental Health Solutions, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, NC, United StatesDepartment of Environmental Sciences and Engineering, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, NC, United StatesCurriculum in Toxicology and Environmental Medicine, School of Medicine, University of North Carolina, Chapel Hill, NC, United StatesCenter for Environmental Medicine, Asthma and Lung Biology, School of Medicine, University of North Carolina, Chapel Hill, NC, United States0Department of Pediatrics, Microbiology and Immunology, School of Medicine, University of North Carolina, Chapel Hill, NC, United StatesThe Institute for Environmental Health Solutions, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, NC, United StatesDepartment of Environmental Sciences and Engineering, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, NC, United StatesCurriculum in Toxicology and Environmental Medicine, School of Medicine, University of North Carolina, Chapel Hill, NC, United StatesThe Institute for Environmental Health Solutions, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, NC, United StatesDepartment of Environmental Sciences and Engineering, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, NC, United StatesCurriculum in Toxicology and Environmental Medicine, School of Medicine, University of North Carolina, Chapel Hill, NC, United StatesCenter for Environmental Medicine, Asthma and Lung Biology, School of Medicine, University of North Carolina, Chapel Hill, NC, United StatesResearch in environmental health is becoming increasingly reliant upon data science and computational methods that can more efficiently extract information from complex datasets. Data science and computational methods can be leveraged to better identify relationships between exposures to stressors in the environment and human disease outcomes, representing critical information needed to protect and improve global public health. Still, there remains a critical gap surrounding the training of researchers on these in silico methods. We aimed to address this gap by developing the inTelligence And Machine lEarning (TAME) Toolkit, promoting trainee-driven data generation, management, and analysis methods to “TAME” data in environmental health studies. Training modules were developed to provide applications-driven examples of data organization and analysis methods that can be used to address environmental health questions. Target audiences for these modules include students, post-baccalaureate and post-doctorate trainees, and professionals that are interested in expanding their skillset to include recent advances in data analysis methods relevant to environmental health, toxicology, exposure science, epidemiology, and bioinformatics/cheminformatics. Modules were developed by study coauthors using annotated script and were organized into three chapters within a GitHub Bookdown site. The first chapter of modules focuses on introductory data science, which includes the following topics: setting up R/RStudio and coding in the R environment; data organization basics; finding and visualizing data trends; high-dimensional data visualizations; and Findability, Accessibility, Interoperability, and Reusability (FAIR) data management practices. The second chapter of modules incorporates chemical-biological analyses and predictive modeling, spanning the following methods: dose-response modeling; machine learning and predictive modeling; mixtures analyses; -omics analyses; toxicokinetic modeling; and read-across toxicity predictions. The last chapter of modules was organized to provide examples on environmental health database mining and integration, including chemical exposure, health outcome, and environmental justice indicators. Training modules and associated data are publicly available online (https://uncsrp.github.io/Data-Analysis-Training-Modules/). Together, this resource provides unique opportunities to obtain introductory-level training on current data analysis methods applicable to 21st century science and environmental health.https://www.frontiersin.org/articles/10.3389/ftox.2022.893924/fullbioinformatics and computational biologycheminformaticsdata scienceepidemiologyexposure sciencemachine learning
spellingShingle Kyle Roell
Lauren E. Koval
Lauren E. Koval
Rebecca Boyles
Grace Patlewicz
Caroline Ring
Cynthia V. Rider
Cavin Ward-Caviness
David M. Reif
Ilona Jaspers
Ilona Jaspers
Ilona Jaspers
Ilona Jaspers
Ilona Jaspers
Rebecca C. Fry
Rebecca C. Fry
Rebecca C. Fry
Julia E. Rager
Julia E. Rager
Julia E. Rager
Julia E. Rager
Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research
Frontiers in Toxicology
bioinformatics and computational biology
cheminformatics
data science
epidemiology
exposure science
machine learning
title Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research
title_full Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research
title_fullStr Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research
title_full_unstemmed Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research
title_short Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research
title_sort development of the intelligence and machine learning tame toolkit for introductory data science chemical biological analyses predictive modeling and database mining for environmental health research
topic bioinformatics and computational biology
cheminformatics
data science
epidemiology
exposure science
machine learning
url https://www.frontiersin.org/articles/10.3389/ftox.2022.893924/full
work_keys_str_mv AT kyleroell developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch
AT laurenekoval developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch
AT laurenekoval developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch
AT rebeccaboyles developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch
AT gracepatlewicz developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch
AT carolinering developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch
AT cynthiavrider developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch
AT cavinwardcaviness developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch
AT davidmreif developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch
AT ilonajaspers developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch
AT ilonajaspers developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch
AT ilonajaspers developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch
AT ilonajaspers developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch
AT ilonajaspers developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch
AT rebeccacfry developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch
AT rebeccacfry developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch
AT rebeccacfry developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch
AT juliaerager developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch
AT juliaerager developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch
AT juliaerager developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch
AT juliaerager developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch