“R” U ready?: a case study using R to analyze changes in gene expression during evolution

As high-throughput methods become more common, training undergraduates to analyze data must include having them generate informative summaries of large datasets. This flexible case study provides an opportunity for undergraduate students to become familiar with the capabilities of R programming in t...

Full description

Bibliographic Details
Main Authors: Amy E. Pomeroy, Andrea Bixler, Stefanie H. Chen, Jennifer E. Kerr, Todd D. Levine, Elizabeth F. Ryder
Format: Article
Language:English
Published: Frontiers Media S.A. 2024-03-01
Series:Frontiers in Education
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/feduc.2024.1379910/full
_version_ 1797248698681393152
author Amy E. Pomeroy
Andrea Bixler
Stefanie H. Chen
Stefanie H. Chen
Jennifer E. Kerr
Todd D. Levine
Elizabeth F. Ryder
author_facet Amy E. Pomeroy
Andrea Bixler
Stefanie H. Chen
Stefanie H. Chen
Jennifer E. Kerr
Todd D. Levine
Elizabeth F. Ryder
author_sort Amy E. Pomeroy
collection DOAJ
description As high-throughput methods become more common, training undergraduates to analyze data must include having them generate informative summaries of large datasets. This flexible case study provides an opportunity for undergraduate students to become familiar with the capabilities of R programming in the context of high-throughput evolutionary data collected using macroarrays. The story line introduces a recent graduate hired at a biotech firm and tasked with analysis and visualization of changes in gene expression from 20,000 generations of the Lenski Lab’s Long-Term Evolution Experiment (LTEE). Our main character is not familiar with R and is guided by a coworker to learn about this platform. Initially this involves a step-by-step analysis of the small Iris dataset built into R which includes sepal and petal length of three species of irises. Practice calculating summary statistics and correlations, and making histograms and scatter plots, prepares the protagonist to perform similar analyses with the LTEE dataset. In the LTEE module, students analyze gene expression data from the long-term evolutionary experiments, developing their skills in manipulating and interpreting large scientific datasets through visualizations and statistical analysis. Prerequisite knowledge is basic statistics, the Central Dogma, and basic evolutionary principles. The Iris module provides hands-on experience using R programming to explore and visualize a simple dataset; it can be used independently as an introduction to R for biological data or skipped if students already have some experience with R. Both modules emphasize understanding the utility of R, rather than creation of original code. Pilot testing showed the case study was well-received by students and faculty, who described it as a clear introduction to R and appreciated the value of R for visualizing and analyzing large datasets.
first_indexed 2024-04-24T20:18:44Z
format Article
id doaj.art-a3a46ee370a84cfeba8f6d3b68d0193c
institution Directory Open Access Journal
issn 2504-284X
language English
last_indexed 2024-04-24T20:18:44Z
publishDate 2024-03-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Education
spelling doaj.art-a3a46ee370a84cfeba8f6d3b68d0193c2024-03-22T12:39:56ZengFrontiers Media S.A.Frontiers in Education2504-284X2024-03-01910.3389/feduc.2024.13799101379910“R” U ready?: a case study using R to analyze changes in gene expression during evolutionAmy E. Pomeroy0Andrea Bixler1Stefanie H. Chen2Stefanie H. Chen3Jennifer E. Kerr4Todd D. Levine5Elizabeth F. Ryder6Department of Pharmacology, Computational Medicine Program, UNC Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, United StatesBiology Program, Clarke University, Dubuque, IA, United StatesDepartment of Biological Sciences, North Carolina State University, Raleigh, NC, United StatesBiotechnology Program, North Carolina State University, Raleigh, NC, United StatesDepartment of Biology, Notre Dame of Maryland University, Baltimore, MD, United StatesDepartment of Life Sciences and Prairie Springs Environmental Education Center, Carroll University, Waukesha, WI, United StatesDepartment of Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA, United StatesAs high-throughput methods become more common, training undergraduates to analyze data must include having them generate informative summaries of large datasets. This flexible case study provides an opportunity for undergraduate students to become familiar with the capabilities of R programming in the context of high-throughput evolutionary data collected using macroarrays. The story line introduces a recent graduate hired at a biotech firm and tasked with analysis and visualization of changes in gene expression from 20,000 generations of the Lenski Lab’s Long-Term Evolution Experiment (LTEE). Our main character is not familiar with R and is guided by a coworker to learn about this platform. Initially this involves a step-by-step analysis of the small Iris dataset built into R which includes sepal and petal length of three species of irises. Practice calculating summary statistics and correlations, and making histograms and scatter plots, prepares the protagonist to perform similar analyses with the LTEE dataset. In the LTEE module, students analyze gene expression data from the long-term evolutionary experiments, developing their skills in manipulating and interpreting large scientific datasets through visualizations and statistical analysis. Prerequisite knowledge is basic statistics, the Central Dogma, and basic evolutionary principles. The Iris module provides hands-on experience using R programming to explore and visualize a simple dataset; it can be used independently as an introduction to R for biological data or skipped if students already have some experience with R. Both modules emphasize understanding the utility of R, rather than creation of original code. Pilot testing showed the case study was well-received by students and faculty, who described it as a clear introduction to R and appreciated the value of R for visualizing and analyzing large datasets.https://www.frontiersin.org/articles/10.3389/feduc.2024.1379910/fullhigh-throughput data analysisR programmingcase studiesevolutionary biologydata cleaningdata visualization
spellingShingle Amy E. Pomeroy
Andrea Bixler
Stefanie H. Chen
Stefanie H. Chen
Jennifer E. Kerr
Todd D. Levine
Elizabeth F. Ryder
“R” U ready?: a case study using R to analyze changes in gene expression during evolution
Frontiers in Education
high-throughput data analysis
R programming
case studies
evolutionary biology
data cleaning
data visualization
title “R” U ready?: a case study using R to analyze changes in gene expression during evolution
title_full “R” U ready?: a case study using R to analyze changes in gene expression during evolution
title_fullStr “R” U ready?: a case study using R to analyze changes in gene expression during evolution
title_full_unstemmed “R” U ready?: a case study using R to analyze changes in gene expression during evolution
title_short “R” U ready?: a case study using R to analyze changes in gene expression during evolution
title_sort r u ready a case study using r to analyze changes in gene expression during evolution
topic high-throughput data analysis
R programming
case studies
evolutionary biology
data cleaning
data visualization
url https://www.frontiersin.org/articles/10.3389/feduc.2024.1379910/full
work_keys_str_mv AT amyepomeroy rureadyacasestudyusingrtoanalyzechangesingeneexpressionduringevolution
AT andreabixler rureadyacasestudyusingrtoanalyzechangesingeneexpressionduringevolution
AT stefaniehchen rureadyacasestudyusingrtoanalyzechangesingeneexpressionduringevolution
AT stefaniehchen rureadyacasestudyusingrtoanalyzechangesingeneexpressionduringevolution
AT jenniferekerr rureadyacasestudyusingrtoanalyzechangesingeneexpressionduringevolution
AT todddlevine rureadyacasestudyusingrtoanalyzechangesingeneexpressionduringevolution
AT elizabethfryder rureadyacasestudyusingrtoanalyzechangesingeneexpressionduringevolution