Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research
Research in environmental health is becoming increasingly reliant upon data science and computational methods that can more efficiently extract information from complex datasets. Data science and computational methods can be leveraged to better identify relationships between exposures to stressors i...
Main Authors: | , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2022-06-01
|
Series: | Frontiers in Toxicology |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/ftox.2022.893924/full |
_version_ | 1811343981985923072 |
---|---|
author | Kyle Roell Lauren E. Koval Lauren E. Koval Rebecca Boyles Grace Patlewicz Caroline Ring Cynthia V. Rider Cavin Ward-Caviness David M. Reif Ilona Jaspers Ilona Jaspers Ilona Jaspers Ilona Jaspers Ilona Jaspers Rebecca C. Fry Rebecca C. Fry Rebecca C. Fry Julia E. Rager Julia E. Rager Julia E. Rager Julia E. Rager |
author_facet | Kyle Roell Lauren E. Koval Lauren E. Koval Rebecca Boyles Grace Patlewicz Caroline Ring Cynthia V. Rider Cavin Ward-Caviness David M. Reif Ilona Jaspers Ilona Jaspers Ilona Jaspers Ilona Jaspers Ilona Jaspers Rebecca C. Fry Rebecca C. Fry Rebecca C. Fry Julia E. Rager Julia E. Rager Julia E. Rager Julia E. Rager |
author_sort | Kyle Roell |
collection | DOAJ |
description | Research in environmental health is becoming increasingly reliant upon data science and computational methods that can more efficiently extract information from complex datasets. Data science and computational methods can be leveraged to better identify relationships between exposures to stressors in the environment and human disease outcomes, representing critical information needed to protect and improve global public health. Still, there remains a critical gap surrounding the training of researchers on these in silico methods. We aimed to address this gap by developing the inTelligence And Machine lEarning (TAME) Toolkit, promoting trainee-driven data generation, management, and analysis methods to “TAME” data in environmental health studies. Training modules were developed to provide applications-driven examples of data organization and analysis methods that can be used to address environmental health questions. Target audiences for these modules include students, post-baccalaureate and post-doctorate trainees, and professionals that are interested in expanding their skillset to include recent advances in data analysis methods relevant to environmental health, toxicology, exposure science, epidemiology, and bioinformatics/cheminformatics. Modules were developed by study coauthors using annotated script and were organized into three chapters within a GitHub Bookdown site. The first chapter of modules focuses on introductory data science, which includes the following topics: setting up R/RStudio and coding in the R environment; data organization basics; finding and visualizing data trends; high-dimensional data visualizations; and Findability, Accessibility, Interoperability, and Reusability (FAIR) data management practices. The second chapter of modules incorporates chemical-biological analyses and predictive modeling, spanning the following methods: dose-response modeling; machine learning and predictive modeling; mixtures analyses; -omics analyses; toxicokinetic modeling; and read-across toxicity predictions. The last chapter of modules was organized to provide examples on environmental health database mining and integration, including chemical exposure, health outcome, and environmental justice indicators. Training modules and associated data are publicly available online (https://uncsrp.github.io/Data-Analysis-Training-Modules/). Together, this resource provides unique opportunities to obtain introductory-level training on current data analysis methods applicable to 21st century science and environmental health. |
first_indexed | 2024-04-13T19:39:43Z |
format | Article |
id | doaj.art-1eb5b45b3f9746218e2f2940b506c6cc |
institution | Directory Open Access Journal |
issn | 2673-3080 |
language | English |
last_indexed | 2024-04-13T19:39:43Z |
publishDate | 2022-06-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Toxicology |
spelling | doaj.art-1eb5b45b3f9746218e2f2940b506c6cc2022-12-22T02:32:56ZengFrontiers Media S.A.Frontiers in Toxicology2673-30802022-06-01410.3389/ftox.2022.893924893924Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health ResearchKyle Roell0Lauren E. Koval1Lauren E. Koval2Rebecca Boyles3Grace Patlewicz4Caroline Ring5Cynthia V. Rider6Cavin Ward-Caviness7David M. Reif8Ilona Jaspers9Ilona Jaspers10Ilona Jaspers11Ilona Jaspers12Ilona Jaspers13Rebecca C. Fry14Rebecca C. Fry15Rebecca C. Fry16Julia E. Rager17Julia E. Rager18Julia E. Rager19Julia E. Rager20The Institute for Environmental Health Solutions, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, NC, United StatesThe Institute for Environmental Health Solutions, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, NC, United StatesDepartment of Environmental Sciences and Engineering, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, NC, United StatesResearch Computing, RTI International, Durham, NC, United StatesCenter for Computational Toxicology and Exposure, US Environmental Protection Agency, Durham, NC, United StatesCenter for Computational Toxicology and Exposure, US Environmental Protection Agency, Durham, NC, United StatesDivision of the National Toxicology Program, National Institute of Environmental Health Sciences, Durham, NC, United StatesCenter for Public Health and Environmental Assessment, US Environmental Protection Agency, Chapel Hill, NC, United StatesBioinformatics Research Center, Department of Biological Sciences, North Carolina State University, Raleigh, NC, United StatesThe Institute for Environmental Health Solutions, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, NC, United StatesDepartment of Environmental Sciences and Engineering, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, NC, United StatesCurriculum in Toxicology and Environmental Medicine, School of Medicine, University of North Carolina, Chapel Hill, NC, United StatesCenter for Environmental Medicine, Asthma and Lung Biology, School of Medicine, University of North Carolina, Chapel Hill, NC, United States0Department of Pediatrics, Microbiology and Immunology, School of Medicine, University of North Carolina, Chapel Hill, NC, United StatesThe Institute for Environmental Health Solutions, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, NC, United StatesDepartment of Environmental Sciences and Engineering, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, NC, United StatesCurriculum in Toxicology and Environmental Medicine, School of Medicine, University of North Carolina, Chapel Hill, NC, United StatesThe Institute for Environmental Health Solutions, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, NC, United StatesDepartment of Environmental Sciences and Engineering, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, NC, United StatesCurriculum in Toxicology and Environmental Medicine, School of Medicine, University of North Carolina, Chapel Hill, NC, United StatesCenter for Environmental Medicine, Asthma and Lung Biology, School of Medicine, University of North Carolina, Chapel Hill, NC, United StatesResearch in environmental health is becoming increasingly reliant upon data science and computational methods that can more efficiently extract information from complex datasets. Data science and computational methods can be leveraged to better identify relationships between exposures to stressors in the environment and human disease outcomes, representing critical information needed to protect and improve global public health. Still, there remains a critical gap surrounding the training of researchers on these in silico methods. We aimed to address this gap by developing the inTelligence And Machine lEarning (TAME) Toolkit, promoting trainee-driven data generation, management, and analysis methods to “TAME” data in environmental health studies. Training modules were developed to provide applications-driven examples of data organization and analysis methods that can be used to address environmental health questions. Target audiences for these modules include students, post-baccalaureate and post-doctorate trainees, and professionals that are interested in expanding their skillset to include recent advances in data analysis methods relevant to environmental health, toxicology, exposure science, epidemiology, and bioinformatics/cheminformatics. Modules were developed by study coauthors using annotated script and were organized into three chapters within a GitHub Bookdown site. The first chapter of modules focuses on introductory data science, which includes the following topics: setting up R/RStudio and coding in the R environment; data organization basics; finding and visualizing data trends; high-dimensional data visualizations; and Findability, Accessibility, Interoperability, and Reusability (FAIR) data management practices. The second chapter of modules incorporates chemical-biological analyses and predictive modeling, spanning the following methods: dose-response modeling; machine learning and predictive modeling; mixtures analyses; -omics analyses; toxicokinetic modeling; and read-across toxicity predictions. The last chapter of modules was organized to provide examples on environmental health database mining and integration, including chemical exposure, health outcome, and environmental justice indicators. Training modules and associated data are publicly available online (https://uncsrp.github.io/Data-Analysis-Training-Modules/). Together, this resource provides unique opportunities to obtain introductory-level training on current data analysis methods applicable to 21st century science and environmental health.https://www.frontiersin.org/articles/10.3389/ftox.2022.893924/fullbioinformatics and computational biologycheminformaticsdata scienceepidemiologyexposure sciencemachine learning |
spellingShingle | Kyle Roell Lauren E. Koval Lauren E. Koval Rebecca Boyles Grace Patlewicz Caroline Ring Cynthia V. Rider Cavin Ward-Caviness David M. Reif Ilona Jaspers Ilona Jaspers Ilona Jaspers Ilona Jaspers Ilona Jaspers Rebecca C. Fry Rebecca C. Fry Rebecca C. Fry Julia E. Rager Julia E. Rager Julia E. Rager Julia E. Rager Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research Frontiers in Toxicology bioinformatics and computational biology cheminformatics data science epidemiology exposure science machine learning |
title | Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research |
title_full | Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research |
title_fullStr | Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research |
title_full_unstemmed | Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research |
title_short | Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research |
title_sort | development of the intelligence and machine learning tame toolkit for introductory data science chemical biological analyses predictive modeling and database mining for environmental health research |
topic | bioinformatics and computational biology cheminformatics data science epidemiology exposure science machine learning |
url | https://www.frontiersin.org/articles/10.3389/ftox.2022.893924/full |
work_keys_str_mv | AT kyleroell developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch AT laurenekoval developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch AT laurenekoval developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch AT rebeccaboyles developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch AT gracepatlewicz developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch AT carolinering developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch AT cynthiavrider developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch AT cavinwardcaviness developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch AT davidmreif developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch AT ilonajaspers developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch AT ilonajaspers developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch AT ilonajaspers developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch AT ilonajaspers developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch AT ilonajaspers developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch AT rebeccacfry developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch AT rebeccacfry developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch AT rebeccacfry developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch AT juliaerager developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch AT juliaerager developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch AT juliaerager developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch AT juliaerager developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch |