Beyond linear regression: A reference for analyzing common data types in discipline based education research
[This paper is part of the Focused Collection on Quantitative Methods in PER: A Critical Examination.] A common goal in discipline-based education research (DBER) is to determine how to improve student outcomes. Linear regression is a common technique used to test hypotheses about the effects of int...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
American Physical Society
2019-07-01
|
Series: | Physical Review Physics Education Research |
Online Access: | http://doi.org/10.1103/PhysRevPhysEducRes.15.020110 |
_version_ | 1818732847464185856 |
---|---|
author | Elli J. Theobald Melissa Aikens Sarah Eddy Hannah Jordt |
author_facet | Elli J. Theobald Melissa Aikens Sarah Eddy Hannah Jordt |
author_sort | Elli J. Theobald |
collection | DOAJ |
description | [This paper is part of the Focused Collection on Quantitative Methods in PER: A Critical Examination.] A common goal in discipline-based education research (DBER) is to determine how to improve student outcomes. Linear regression is a common technique used to test hypotheses about the effects of interventions on continuous outcomes (such as exam score) as well as control for student nonequivalence in quasirandom experimental designs. (In quasirandom designs, subjects are not randomly assigned to treatments. For example, when treatment is assigned by classroom, and observations are made on students, the design is quasirandom because treatment is assigned to classroom, not subject (students).) However, many types of outcome data cannot be appropriately analyzed with linear regression. In these instances, researchers must move beyond linear regression and implement alternative regression techniques. For example, student outcomes can be measured on binary scales (e.g., pass or fail), tightly bound scales (e.g., strongly agree to strongly disagree), or nominal scales (i.e., different discrete choices for example multiple tracks within a physics major), each necessitating alternative regression techniques. Here, we review extensions of linear modeling—generalized linear models (glms)—and specifically compare five glms that are useful for analyzing DBER data: logistic, binomial, proportional odds (also called ordinal; including censored regression), multinomial, and Poisson (including negative binomial, hurdle, and zero-inflated) regression. We introduce a diagnostic tool to facilitate a researcher’s identification of the most appropriate glm for their own data. For each model type, we explain when, why, and how to implement the regression approach. When: we provide examples of the types of research questions and outcome data that would motivate this regression approach, including citations to articles in the DBER literature. Why: we name which linear regression assumption is violated by the data type. How: we detail implementation and interpretation of this modeling approach in R, including R syntax and code, and how to discuss the regression output in research papers. Code accompanying each analysis can be found in the online github repository that is associated with this paper (https://github.com/ejtheobald/BeyondLinearRegression). This paper is not an exhaustive review of regression techniques, nor does it review nonregression-based analyses. Rather, it aims to compile and summarize regression techniques useful for the most common types of DBER data and provide examples, citations, and heavily annotated R code so that researchers can easily implement the technique in their work. |
first_indexed | 2024-12-17T23:40:05Z |
format | Article |
id | doaj.art-23f5568bf4594a129a144c52f1806f35 |
institution | Directory Open Access Journal |
issn | 2469-9896 |
language | English |
last_indexed | 2024-12-17T23:40:05Z |
publishDate | 2019-07-01 |
publisher | American Physical Society |
record_format | Article |
series | Physical Review Physics Education Research |
spelling | doaj.art-23f5568bf4594a129a144c52f1806f352022-12-21T21:28:27ZengAmerican Physical SocietyPhysical Review Physics Education Research2469-98962019-07-0115202011010.1103/PhysRevPhysEducRes.15.020110Beyond linear regression: A reference for analyzing common data types in discipline based education researchElli J. TheobaldMelissa AikensSarah EddyHannah Jordt[This paper is part of the Focused Collection on Quantitative Methods in PER: A Critical Examination.] A common goal in discipline-based education research (DBER) is to determine how to improve student outcomes. Linear regression is a common technique used to test hypotheses about the effects of interventions on continuous outcomes (such as exam score) as well as control for student nonequivalence in quasirandom experimental designs. (In quasirandom designs, subjects are not randomly assigned to treatments. For example, when treatment is assigned by classroom, and observations are made on students, the design is quasirandom because treatment is assigned to classroom, not subject (students).) However, many types of outcome data cannot be appropriately analyzed with linear regression. In these instances, researchers must move beyond linear regression and implement alternative regression techniques. For example, student outcomes can be measured on binary scales (e.g., pass or fail), tightly bound scales (e.g., strongly agree to strongly disagree), or nominal scales (i.e., different discrete choices for example multiple tracks within a physics major), each necessitating alternative regression techniques. Here, we review extensions of linear modeling—generalized linear models (glms)—and specifically compare five glms that are useful for analyzing DBER data: logistic, binomial, proportional odds (also called ordinal; including censored regression), multinomial, and Poisson (including negative binomial, hurdle, and zero-inflated) regression. We introduce a diagnostic tool to facilitate a researcher’s identification of the most appropriate glm for their own data. For each model type, we explain when, why, and how to implement the regression approach. When: we provide examples of the types of research questions and outcome data that would motivate this regression approach, including citations to articles in the DBER literature. Why: we name which linear regression assumption is violated by the data type. How: we detail implementation and interpretation of this modeling approach in R, including R syntax and code, and how to discuss the regression output in research papers. Code accompanying each analysis can be found in the online github repository that is associated with this paper (https://github.com/ejtheobald/BeyondLinearRegression). This paper is not an exhaustive review of regression techniques, nor does it review nonregression-based analyses. Rather, it aims to compile and summarize regression techniques useful for the most common types of DBER data and provide examples, citations, and heavily annotated R code so that researchers can easily implement the technique in their work.http://doi.org/10.1103/PhysRevPhysEducRes.15.020110 |
spellingShingle | Elli J. Theobald Melissa Aikens Sarah Eddy Hannah Jordt Beyond linear regression: A reference for analyzing common data types in discipline based education research Physical Review Physics Education Research |
title | Beyond linear regression: A reference for analyzing common data types in discipline based education research |
title_full | Beyond linear regression: A reference for analyzing common data types in discipline based education research |
title_fullStr | Beyond linear regression: A reference for analyzing common data types in discipline based education research |
title_full_unstemmed | Beyond linear regression: A reference for analyzing common data types in discipline based education research |
title_short | Beyond linear regression: A reference for analyzing common data types in discipline based education research |
title_sort | beyond linear regression a reference for analyzing common data types in discipline based education research |
url | http://doi.org/10.1103/PhysRevPhysEducRes.15.020110 |
work_keys_str_mv | AT ellijtheobald beyondlinearregressionareferenceforanalyzingcommondatatypesindisciplinebasededucationresearch AT melissaaikens beyondlinearregressionareferenceforanalyzingcommondatatypesindisciplinebasededucationresearch AT saraheddy beyondlinearregressionareferenceforanalyzingcommondatatypesindisciplinebasededucationresearch AT hannahjordt beyondlinearregressionareferenceforanalyzingcommondatatypesindisciplinebasededucationresearch |