Introducing Data Science Techniques by Connecting Database Concepts and dplyr

Early exposure to data science skills, such as relational databases, is essential for students in statistics as well as many other disciplines in an increasingly data driven society. The goal of the presented pedagogy is to introduce undergraduate students to fundamental database concepts and to ill...

Full description

Bibliographic Details
Main Authors: Jennifer E. Broatch, Suzanne Dietrich, Don Goelman
Format: Article
Language:English
Published: Taylor & Francis Group 2019-09-01
Series:Journal of Statistics Education
Subjects:
Online Access:http://dx.doi.org/10.1080/10691898.2019.1647768
_version_ 1818336692066582528
author Jennifer E. Broatch
Suzanne Dietrich
Don Goelman
author_facet Jennifer E. Broatch
Suzanne Dietrich
Don Goelman
author_sort Jennifer E. Broatch
collection DOAJ
description Early exposure to data science skills, such as relational databases, is essential for students in statistics as well as many other disciplines in an increasingly data driven society. The goal of the presented pedagogy is to introduce undergraduate students to fundamental database concepts and to illuminate the connection between these database concepts and the functionality provided by the dplyr package for R. Specifically, students are introduced to relational database concepts using visualizations that are specifically designed for students with no data science or computing background. These educational tools, which are freely available on the Web, engage students in the learning process through a dynamic presentation that gently introduces relational databases and how to ask questions of data stored in a relational database. The visualizations are specifically designed for self-study by students, including a formative self-assessment feature. Students are then assigned a corresponding statistics lesson to utilize statistical software in R within the dplyr framework and to emphasize the need for these database skills. This article describes a pilot experience of introducing this pedagogy into a calculus-based introductory statistics course for mathematics and statistics majors, and provides a brief evaluation of the student perspective of the experience. Supplementary materials for this article are available online.
first_indexed 2024-12-13T14:43:21Z
format Article
id doaj.art-03aa5013afe54f7b99cefe67a4f7368a
institution Directory Open Access Journal
issn 1069-1898
language English
last_indexed 2024-12-13T14:43:21Z
publishDate 2019-09-01
publisher Taylor & Francis Group
record_format Article
series Journal of Statistics Education
spelling doaj.art-03aa5013afe54f7b99cefe67a4f7368a2022-12-21T23:41:32ZengTaylor & Francis GroupJournal of Statistics Education1069-18982019-09-0127314715310.1080/10691898.2019.16477681647768Introducing Data Science Techniques by Connecting Database Concepts and dplyrJennifer E. Broatch0Suzanne Dietrich1Don Goelman2Arizona State UniversityArizona State UniversityVillanova UniversityEarly exposure to data science skills, such as relational databases, is essential for students in statistics as well as many other disciplines in an increasingly data driven society. The goal of the presented pedagogy is to introduce undergraduate students to fundamental database concepts and to illuminate the connection between these database concepts and the functionality provided by the dplyr package for R. Specifically, students are introduced to relational database concepts using visualizations that are specifically designed for students with no data science or computing background. These educational tools, which are freely available on the Web, engage students in the learning process through a dynamic presentation that gently introduces relational databases and how to ask questions of data stored in a relational database. The visualizations are specifically designed for self-study by students, including a formative self-assessment feature. Students are then assigned a corresponding statistics lesson to utilize statistical software in R within the dplyr framework and to emphasize the need for these database skills. This article describes a pilot experience of introducing this pedagogy into a calculus-based introductory statistics course for mathematics and statistics majors, and provides a brief evaluation of the student perspective of the experience. Supplementary materials for this article are available online.http://dx.doi.org/10.1080/10691898.2019.1647768data sciencedatabaseseducationteaching tool
spellingShingle Jennifer E. Broatch
Suzanne Dietrich
Don Goelman
Introducing Data Science Techniques by Connecting Database Concepts and dplyr
Journal of Statistics Education
data science
databases
education
teaching tool
title Introducing Data Science Techniques by Connecting Database Concepts and dplyr
title_full Introducing Data Science Techniques by Connecting Database Concepts and dplyr
title_fullStr Introducing Data Science Techniques by Connecting Database Concepts and dplyr
title_full_unstemmed Introducing Data Science Techniques by Connecting Database Concepts and dplyr
title_short Introducing Data Science Techniques by Connecting Database Concepts and dplyr
title_sort introducing data science techniques by connecting database concepts and dplyr
topic data science
databases
education
teaching tool
url http://dx.doi.org/10.1080/10691898.2019.1647768
work_keys_str_mv AT jenniferebroatch introducingdatasciencetechniquesbyconnectingdatabaseconceptsanddplyr
AT suzannedietrich introducingdatasciencetechniquesbyconnectingdatabaseconceptsanddplyr
AT dongoelman introducingdatasciencetechniquesbyconnectingdatabaseconceptsanddplyr