Beyond linear regression: A reference for analyzing common data types in discipline based education research

[This paper is part of the Focused Collection on Quantitative Methods in PER: A Critical Examination.] A common goal in discipline-based education research (DBER) is to determine how to improve student outcomes. Linear regression is a common technique used to test hypotheses about the effects of int...

Full description

Bibliographic Details
Main Authors: Elli J. Theobald, Melissa Aikens, Sarah Eddy, Hannah Jordt
Format: Article
Language:English
Published: American Physical Society 2019-07-01
Series:Physical Review Physics Education Research
Online Access:http://doi.org/10.1103/PhysRevPhysEducRes.15.020110
_version_ 1818732847464185856
author Elli J. Theobald
Melissa Aikens
Sarah Eddy
Hannah Jordt
author_facet Elli J. Theobald
Melissa Aikens
Sarah Eddy
Hannah Jordt
author_sort Elli J. Theobald
collection DOAJ
description [This paper is part of the Focused Collection on Quantitative Methods in PER: A Critical Examination.] A common goal in discipline-based education research (DBER) is to determine how to improve student outcomes. Linear regression is a common technique used to test hypotheses about the effects of interventions on continuous outcomes (such as exam score) as well as control for student nonequivalence in quasirandom experimental designs. (In quasirandom designs, subjects are not randomly assigned to treatments. For example, when treatment is assigned by classroom, and observations are made on students, the design is quasirandom because treatment is assigned to classroom, not subject (students).) However, many types of outcome data cannot be appropriately analyzed with linear regression. In these instances, researchers must move beyond linear regression and implement alternative regression techniques. For example, student outcomes can be measured on binary scales (e.g., pass or fail), tightly bound scales (e.g., strongly agree to strongly disagree), or nominal scales (i.e., different discrete choices for example multiple tracks within a physics major), each necessitating alternative regression techniques. Here, we review extensions of linear modeling—generalized linear models (glms)—and specifically compare five glms that are useful for analyzing DBER data: logistic, binomial, proportional odds (also called ordinal; including censored regression), multinomial, and Poisson (including negative binomial, hurdle, and zero-inflated) regression. We introduce a diagnostic tool to facilitate a researcher’s identification of the most appropriate glm for their own data. For each model type, we explain when, why, and how to implement the regression approach. When: we provide examples of the types of research questions and outcome data that would motivate this regression approach, including citations to articles in the DBER literature. Why: we name which linear regression assumption is violated by the data type. How: we detail implementation and interpretation of this modeling approach in R, including R syntax and code, and how to discuss the regression output in research papers. Code accompanying each analysis can be found in the online github repository that is associated with this paper (https://github.com/ejtheobald/BeyondLinearRegression). This paper is not an exhaustive review of regression techniques, nor does it review nonregression-based analyses. Rather, it aims to compile and summarize regression techniques useful for the most common types of DBER data and provide examples, citations, and heavily annotated R code so that researchers can easily implement the technique in their work.
first_indexed 2024-12-17T23:40:05Z
format Article
id doaj.art-23f5568bf4594a129a144c52f1806f35
institution Directory Open Access Journal
issn 2469-9896
language English
last_indexed 2024-12-17T23:40:05Z
publishDate 2019-07-01
publisher American Physical Society
record_format Article
series Physical Review Physics Education Research
spelling doaj.art-23f5568bf4594a129a144c52f1806f352022-12-21T21:28:27ZengAmerican Physical SocietyPhysical Review Physics Education Research2469-98962019-07-0115202011010.1103/PhysRevPhysEducRes.15.020110Beyond linear regression: A reference for analyzing common data types in discipline based education researchElli J. TheobaldMelissa AikensSarah EddyHannah Jordt[This paper is part of the Focused Collection on Quantitative Methods in PER: A Critical Examination.] A common goal in discipline-based education research (DBER) is to determine how to improve student outcomes. Linear regression is a common technique used to test hypotheses about the effects of interventions on continuous outcomes (such as exam score) as well as control for student nonequivalence in quasirandom experimental designs. (In quasirandom designs, subjects are not randomly assigned to treatments. For example, when treatment is assigned by classroom, and observations are made on students, the design is quasirandom because treatment is assigned to classroom, not subject (students).) However, many types of outcome data cannot be appropriately analyzed with linear regression. In these instances, researchers must move beyond linear regression and implement alternative regression techniques. For example, student outcomes can be measured on binary scales (e.g., pass or fail), tightly bound scales (e.g., strongly agree to strongly disagree), or nominal scales (i.e., different discrete choices for example multiple tracks within a physics major), each necessitating alternative regression techniques. Here, we review extensions of linear modeling—generalized linear models (glms)—and specifically compare five glms that are useful for analyzing DBER data: logistic, binomial, proportional odds (also called ordinal; including censored regression), multinomial, and Poisson (including negative binomial, hurdle, and zero-inflated) regression. We introduce a diagnostic tool to facilitate a researcher’s identification of the most appropriate glm for their own data. For each model type, we explain when, why, and how to implement the regression approach. When: we provide examples of the types of research questions and outcome data that would motivate this regression approach, including citations to articles in the DBER literature. Why: we name which linear regression assumption is violated by the data type. How: we detail implementation and interpretation of this modeling approach in R, including R syntax and code, and how to discuss the regression output in research papers. Code accompanying each analysis can be found in the online github repository that is associated with this paper (https://github.com/ejtheobald/BeyondLinearRegression). This paper is not an exhaustive review of regression techniques, nor does it review nonregression-based analyses. Rather, it aims to compile and summarize regression techniques useful for the most common types of DBER data and provide examples, citations, and heavily annotated R code so that researchers can easily implement the technique in their work.http://doi.org/10.1103/PhysRevPhysEducRes.15.020110
spellingShingle Elli J. Theobald
Melissa Aikens
Sarah Eddy
Hannah Jordt
Beyond linear regression: A reference for analyzing common data types in discipline based education research
Physical Review Physics Education Research
title Beyond linear regression: A reference for analyzing common data types in discipline based education research
title_full Beyond linear regression: A reference for analyzing common data types in discipline based education research
title_fullStr Beyond linear regression: A reference for analyzing common data types in discipline based education research
title_full_unstemmed Beyond linear regression: A reference for analyzing common data types in discipline based education research
title_short Beyond linear regression: A reference for analyzing common data types in discipline based education research
title_sort beyond linear regression a reference for analyzing common data types in discipline based education research
url http://doi.org/10.1103/PhysRevPhysEducRes.15.020110
work_keys_str_mv AT ellijtheobald beyondlinearregressionareferenceforanalyzingcommondatatypesindisciplinebasededucationresearch
AT melissaaikens beyondlinearregressionareferenceforanalyzingcommondatatypesindisciplinebasededucationresearch
AT saraheddy beyondlinearregressionareferenceforanalyzingcommondatatypesindisciplinebasededucationresearch
AT hannahjordt beyondlinearregressionareferenceforanalyzingcommondatatypesindisciplinebasededucationresearch