Defensible inferences from a nested sequence of logistic regressions: a guide for the perplexed

Abstract Employing nested sequences of models is a common practice when exploring the extent to which one set of variables mediates the impact of another set. Such an analysis in the context of logistic regression models confronts two challenges: (i) direct comparisons of coefficients across models...

Full description

Bibliographic Details
Main Authors:	Gulsah Gurkan, Yoav Benjamini, Henry Braun
Format:	Article
Language:	English
Published:	SpringerOpen 2021-07-01
Series:	Large-scale Assessments in Education
Subjects:	PIAAC Logistic regression Nested model comparisons KHB method Multiplicity False Discovery Rate
Online Access:	https://doi.org/10.1186/s40536-021-00111-7

_version_	1819139940531830784
author	Gulsah Gurkan Yoav Benjamini Henry Braun
author_facet	Gulsah Gurkan Yoav Benjamini Henry Braun
author_sort	Gulsah Gurkan
collection	DOAJ
description	Abstract Employing nested sequences of models is a common practice when exploring the extent to which one set of variables mediates the impact of another set. Such an analysis in the context of logistic regression models confronts two challenges: (i) direct comparisons of coefficients across models are generally biased due to the changes in scale that accompany the changes in the set of explanatory variables, (ii) conducting a large number of tests induces a problem of multiplicity that can lead to spurious findings of significance if not heeded. This article aims to illustrate a practical strategy for conducting analyses in the face of these challenges. The challenges—and how to address them—are illustrated using a subset of the findings reported by Braun (Large-scale Assess Educ 6(4):1–52, 2018. 10.1186/s40536-018-0058-x), drawn from the Programme for the International Assessment of Adult Competencies (PIAAC), an international, large-scale assessment of adults. For each country in the dataset, a nested pair of logistic regression models was fit in order to investigate the role of Educational Attainment and Cognitive Skills in mediating the impact of family background and demographic characteristics on the location of an individual’s annual income in the national income distribution. A modified version of the Karlson–Holm–Breen (KHB) method was employed to obtain an unbiased estimate of the true differences in the coefficients between nested logistic models. In order to address the issue of multiplicity, a recent generalization of the Benjamini–Hochberg (BH) False Discovery Rate (FDR)-controlling procedure to hierarchically structured hypotheses was employed and compared to two conventional methods. The differences between the changes in coefficients calculated conventionally and with the KHB adjustment varied from negligible to very substantial. When combined with the actual magnitudes of the coefficients, we concluded that the more proximal factors indeed act as strong mediators for the background factors, but less so for Age, and hardly at all for Gender. With respect to multiplicity, applying the FDR-controlling procedure yielded results very similar to those obtained by applying a standard per-comparison procedure, but quite a few more discoveries in comparison to the Bonferroni procedure. The KHB methodology illustrated here can be applied wherever there is interest in comparing nested logistic regressions. Modifications to account for probability sampling are practicable. The categorization of variables and the order of entry should be determined by substantive considerations. On the other hand, the BH procedure is perfectly general and can be implemented to address multiplicity issues in a broad range of settings.
first_indexed	2024-12-22T11:30:39Z
format	Article
id	doaj.art-a6537df0e0424f78abb5d7f7bcb57b65
institution	Directory Open Access Journal
issn	2196-0739
language	English
last_indexed	2024-12-22T11:30:39Z
publishDate	2021-07-01
publisher	SpringerOpen
record_format	Article
series	Large-scale Assessments in Education
spelling	doaj.art-a6537df0e0424f78abb5d7f7bcb57b652022-12-21T18:27:38ZengSpringerOpenLarge-scale Assessments in Education2196-07392021-07-019112410.1186/s40536-021-00111-7Defensible inferences from a nested sequence of logistic regressions: a guide for the perplexedGulsah Gurkan0Yoav Benjamini1Henry Braun2Boston CollegeTel-Aviv UniversityBoston CollegeAbstract Employing nested sequences of models is a common practice when exploring the extent to which one set of variables mediates the impact of another set. Such an analysis in the context of logistic regression models confronts two challenges: (i) direct comparisons of coefficients across models are generally biased due to the changes in scale that accompany the changes in the set of explanatory variables, (ii) conducting a large number of tests induces a problem of multiplicity that can lead to spurious findings of significance if not heeded. This article aims to illustrate a practical strategy for conducting analyses in the face of these challenges. The challenges—and how to address them—are illustrated using a subset of the findings reported by Braun (Large-scale Assess Educ 6(4):1–52, 2018. 10.1186/s40536-018-0058-x), drawn from the Programme for the International Assessment of Adult Competencies (PIAAC), an international, large-scale assessment of adults. For each country in the dataset, a nested pair of logistic regression models was fit in order to investigate the role of Educational Attainment and Cognitive Skills in mediating the impact of family background and demographic characteristics on the location of an individual’s annual income in the national income distribution. A modified version of the Karlson–Holm–Breen (KHB) method was employed to obtain an unbiased estimate of the true differences in the coefficients between nested logistic models. In order to address the issue of multiplicity, a recent generalization of the Benjamini–Hochberg (BH) False Discovery Rate (FDR)-controlling procedure to hierarchically structured hypotheses was employed and compared to two conventional methods. The differences between the changes in coefficients calculated conventionally and with the KHB adjustment varied from negligible to very substantial. When combined with the actual magnitudes of the coefficients, we concluded that the more proximal factors indeed act as strong mediators for the background factors, but less so for Age, and hardly at all for Gender. With respect to multiplicity, applying the FDR-controlling procedure yielded results very similar to those obtained by applying a standard per-comparison procedure, but quite a few more discoveries in comparison to the Bonferroni procedure. The KHB methodology illustrated here can be applied wherever there is interest in comparing nested logistic regressions. Modifications to account for probability sampling are practicable. The categorization of variables and the order of entry should be determined by substantive considerations. On the other hand, the BH procedure is perfectly general and can be implemented to address multiplicity issues in a broad range of settings.https://doi.org/10.1186/s40536-021-00111-7PIAACLogistic regressionNested model comparisonsKHB methodMultiplicityFalse Discovery Rate
spellingShingle	Gulsah Gurkan Yoav Benjamini Henry Braun Defensible inferences from a nested sequence of logistic regressions: a guide for the perplexed Large-scale Assessments in Education PIAAC Logistic regression Nested model comparisons KHB method Multiplicity False Discovery Rate
title	Defensible inferences from a nested sequence of logistic regressions: a guide for the perplexed
title_full	Defensible inferences from a nested sequence of logistic regressions: a guide for the perplexed
title_fullStr	Defensible inferences from a nested sequence of logistic regressions: a guide for the perplexed
title_full_unstemmed	Defensible inferences from a nested sequence of logistic regressions: a guide for the perplexed
title_short	Defensible inferences from a nested sequence of logistic regressions: a guide for the perplexed
title_sort	defensible inferences from a nested sequence of logistic regressions a guide for the perplexed
topic	PIAAC Logistic regression Nested model comparisons KHB method Multiplicity False Discovery Rate
url	https://doi.org/10.1186/s40536-021-00111-7
work_keys_str_mv	AT gulsahgurkan defensibleinferencesfromanestedsequenceoflogisticregressionsaguidefortheperplexed AT yoavbenjamini defensibleinferencesfromanestedsequenceoflogisticregressionsaguidefortheperplexed AT henrybraun defensibleinferencesfromanestedsequenceoflogisticregressionsaguidefortheperplexed

Defensible inferences from a nested sequence of logistic regressions: a guide for the perplexed

Similar Items