Spurious interaction as a result of categorization

Abstract Background It is common in applied epidemiological and clinical research to convert continuous variables into categorical variables by grouping values into categories. Such categorized variables are then often used as exposure variables in some regression model. There are numerous statistic...

Full description

Bibliographic Details
Main Author: Magne Thoresen
Format: Article
Language:English
Published: BMC 2019-02-01
Series:BMC Medical Research Methodology
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12874-019-0667-2
_version_ 1818837157666619392
author Magne Thoresen
author_facet Magne Thoresen
author_sort Magne Thoresen
collection DOAJ
description Abstract Background It is common in applied epidemiological and clinical research to convert continuous variables into categorical variables by grouping values into categories. Such categorized variables are then often used as exposure variables in some regression model. There are numerous statistical arguments why this practice should be avoided, and in this paper we present yet another such argument. Methods We show that categorization may lead to spurious interaction in multiple regression models. We give precise analytical expressions for when this may happen in the linear regression model with normally distributed exposure variables, and we show by simulations that the analytical results are valid also for other distributions. Further, we give an interpretation of the results in terms of a measurement error problem. Results We show that, in the case of a linear model with two normally distributed exposure variables, both categorized at the same cut point, a spurious interaction will be induced unless the two variables are categorized at the median or they are uncorrelated. In simulations with exposure variables following other distributions, we confirm this general effect of categorization, but we also show that the effect of the choice of cut point varies over different distributions. Conclusion Categorization of continuous exposure variables leads to a number of problems, among them spurious interaction effects. Hence, this practice should be avoided and other methods should be considered.
first_indexed 2024-12-19T03:18:02Z
format Article
id doaj.art-8212cf4a86e64c83b15c49188bb704dd
institution Directory Open Access Journal
issn 1471-2288
language English
last_indexed 2024-12-19T03:18:02Z
publishDate 2019-02-01
publisher BMC
record_format Article
series BMC Medical Research Methodology
spelling doaj.art-8212cf4a86e64c83b15c49188bb704dd2022-12-21T20:37:50ZengBMCBMC Medical Research Methodology1471-22882019-02-011911810.1186/s12874-019-0667-2Spurious interaction as a result of categorizationMagne Thoresen0Centre for Biostatistics and Epidemiology, Department of Biostatistics, University of OsloAbstract Background It is common in applied epidemiological and clinical research to convert continuous variables into categorical variables by grouping values into categories. Such categorized variables are then often used as exposure variables in some regression model. There are numerous statistical arguments why this practice should be avoided, and in this paper we present yet another such argument. Methods We show that categorization may lead to spurious interaction in multiple regression models. We give precise analytical expressions for when this may happen in the linear regression model with normally distributed exposure variables, and we show by simulations that the analytical results are valid also for other distributions. Further, we give an interpretation of the results in terms of a measurement error problem. Results We show that, in the case of a linear model with two normally distributed exposure variables, both categorized at the same cut point, a spurious interaction will be induced unless the two variables are categorized at the median or they are uncorrelated. In simulations with exposure variables following other distributions, we confirm this general effect of categorization, but we also show that the effect of the choice of cut point varies over different distributions. Conclusion Categorization of continuous exposure variables leads to a number of problems, among them spurious interaction effects. Hence, this practice should be avoided and other methods should be considered.http://link.springer.com/article/10.1186/s12874-019-0667-2CategorizationDichotomizationInteractionRegressionMeasurement error
spellingShingle Magne Thoresen
Spurious interaction as a result of categorization
BMC Medical Research Methodology
Categorization
Dichotomization
Interaction
Regression
Measurement error
title Spurious interaction as a result of categorization
title_full Spurious interaction as a result of categorization
title_fullStr Spurious interaction as a result of categorization
title_full_unstemmed Spurious interaction as a result of categorization
title_short Spurious interaction as a result of categorization
title_sort spurious interaction as a result of categorization
topic Categorization
Dichotomization
Interaction
Regression
Measurement error
url http://link.springer.com/article/10.1186/s12874-019-0667-2
work_keys_str_mv AT magnethoresen spuriousinteractionasaresultofcategorization