Linear programming algorithms for detecting separated data in binary logistic regression models

This thesis is a study of the detection of separation among the sample points in binary logistic regression models. We propose a new algorithm for detecting separation and demonstrate empirically that it can be computed fast enough to be used routinely as part of the fitting process for logistic reg...

Disgrifiad llawn

Manylion Llyfryddiaeth
Prif Awdur: Konis, K
Awduron Eraill: Ripley, B
Fformat: Traethawd Ymchwil
Iaith:English
Cyhoeddwyd: 2007
Pynciau:
_version_ 1826284767405932544
author Konis, K
author2 Ripley, B
author_facet Ripley, B
Konis, K
author_sort Konis, K
collection OXFORD
description This thesis is a study of the detection of separation among the sample points in binary logistic regression models. We propose a new algorithm for detecting separation and demonstrate empirically that it can be computed fast enough to be used routinely as part of the fitting process for logistic regression models. The parameter estimates of a binary logistic regression model fit using the method of maximum likelihood sometimes do not converge to finite values. This phenomenon (also known as monotone likelihood or infinite parameters) occurs because of a condition among the sample points known as separation. There are two classes of separation. When complete separation is present among the sample points, iterative procedures for maximizing the likelihood tend to break down, when it would be clear that there is a problem with the model. However, when quasicomplete separation is present among the sample points, the iterative procedures for maximizing the likelihood tend to satisfy their convergence criterion before revealing any indication of separation. The new algorithm is based on a linear program with a nonnegative objective function that has a positive optimal value when separation is present among the sample points. We compare several approaches for solving this linear program and find that a method based on determining the feasibility of the dual to this linear program provides a numerically reliable test for separation among the sample points. A simulation study shows that this test can be computed in a similar amount of time as fitting the binary logistic regression model using the method of iteratively reweighted least squares: hence the test is fast enough to be used routinely as part of the fitting procedure. An implementation of our algorithm (as well as the other methods described in this thesis) is available in the R package safeBinaryRegression.
first_indexed 2024-03-07T01:18:47Z
format Thesis
id oxford-uuid:8f9ee0d0-d78e-4101-9ab4-f9cbceed2a2a
institution University of Oxford
language English
last_indexed 2024-03-07T01:18:47Z
publishDate 2007
record_format dspace
spelling oxford-uuid:8f9ee0d0-d78e-4101-9ab4-f9cbceed2a2a2022-03-26T23:05:40ZLinear programming algorithms for detecting separated data in binary logistic regression modelsThesishttp://purl.org/coar/resource_type/c_db06uuid:8f9ee0d0-d78e-4101-9ab4-f9cbceed2a2aStatistics (see also social sciences)Computationally-intensive statisticsEnglishOxford University Research Archive - Valet2007Konis, KRipley, BThis thesis is a study of the detection of separation among the sample points in binary logistic regression models. We propose a new algorithm for detecting separation and demonstrate empirically that it can be computed fast enough to be used routinely as part of the fitting process for logistic regression models. The parameter estimates of a binary logistic regression model fit using the method of maximum likelihood sometimes do not converge to finite values. This phenomenon (also known as monotone likelihood or infinite parameters) occurs because of a condition among the sample points known as separation. There are two classes of separation. When complete separation is present among the sample points, iterative procedures for maximizing the likelihood tend to break down, when it would be clear that there is a problem with the model. However, when quasicomplete separation is present among the sample points, the iterative procedures for maximizing the likelihood tend to satisfy their convergence criterion before revealing any indication of separation. The new algorithm is based on a linear program with a nonnegative objective function that has a positive optimal value when separation is present among the sample points. We compare several approaches for solving this linear program and find that a method based on determining the feasibility of the dual to this linear program provides a numerically reliable test for separation among the sample points. A simulation study shows that this test can be computed in a similar amount of time as fitting the binary logistic regression model using the method of iteratively reweighted least squares: hence the test is fast enough to be used routinely as part of the fitting procedure. An implementation of our algorithm (as well as the other methods described in this thesis) is available in the R package safeBinaryRegression.
spellingShingle Statistics (see also social sciences)
Computationally-intensive statistics
Konis, K
Linear programming algorithms for detecting separated data in binary logistic regression models
title Linear programming algorithms for detecting separated data in binary logistic regression models
title_full Linear programming algorithms for detecting separated data in binary logistic regression models
title_fullStr Linear programming algorithms for detecting separated data in binary logistic regression models
title_full_unstemmed Linear programming algorithms for detecting separated data in binary logistic regression models
title_short Linear programming algorithms for detecting separated data in binary logistic regression models
title_sort linear programming algorithms for detecting separated data in binary logistic regression models
topic Statistics (see also social sciences)
Computationally-intensive statistics
work_keys_str_mv AT konisk linearprogrammingalgorithmsfordetectingseparateddatainbinarylogisticregressionmodels