Linear programming algorithms for detecting separated data in binary logistic regression models

This thesis is a study of the detection of separation among the sample points in binary logistic regression models. We propose a new algorithm for detecting separation and demonstrate empirically that it can be computed fast enough to be used routinely as part of the fitting process for logistic reg...

Disgrifiad llawn

Manylion Llyfryddiaeth
Prif Awdur:	Konis, K
Awduron Eraill:	Ripley, B
Fformat:	Traethawd Ymchwil
Iaith:	English
Cyhoeddwyd:	2007
Pynciau:	Statistics (see also social sciences) Computationally-intensive statistics

_version_	1826284767405932544
author	Konis, K
author2	Ripley, B
author_facet	Ripley, B Konis, K
author_sort	Konis, K
collection	OXFORD
description	This thesis is a study of the detection of separation among the sample points in binary logistic regression models. We propose a new algorithm for detecting separation and demonstrate empirically that it can be computed fast enough to be used routinely as part of the fitting process for logistic regression models. The parameter estimates of a binary logistic regression model fit using the method of maximum likelihood sometimes do not converge to finite values. This phenomenon (also known as monotone likelihood or infinite parameters) occurs because of a condition among the sample points known as separation. There are two classes of separation. When complete separation is present among the sample points, iterative procedures for maximizing the likelihood tend to break down, when it would be clear that there is a problem with the model. However, when quasicomplete separation is present among the sample points, the iterative procedures for maximizing the likelihood tend to satisfy their convergence criterion before revealing any indication of separation. The new algorithm is based on a linear program with a nonnegative objective function that has a positive optimal value when separation is present among the sample points. We compare several approaches for solving this linear program and find that a method based on determining the feasibility of the dual to this linear program provides a numerically reliable test for separation among the sample points. A simulation study shows that this test can be computed in a similar amount of time as fitting the binary logistic regression model using the method of iteratively reweighted least squares: hence the test is fast enough to be used routinely as part of the fitting procedure. An implementation of our algorithm (as well as the other methods described in this thesis) is available in the R package safeBinaryRegression.
first_indexed	2024-03-07T01:18:47Z
format	Thesis
id	oxford-uuid:8f9ee0d0-d78e-4101-9ab4-f9cbceed2a2a
institution	University of Oxford
language	English
last_indexed	2024-03-07T01:18:47Z
publishDate	2007
record_format	dspace
spelling	oxford-uuid:8f9ee0d0-d78e-4101-9ab4-f9cbceed2a2a2022-03-26T23:05:40ZLinear programming algorithms for detecting separated data in binary logistic regression modelsThesishttp://purl.org/coar/resource_type/c_db06uuid:8f9ee0d0-d78e-4101-9ab4-f9cbceed2a2aStatistics (see also social sciences)Computationally-intensive statisticsEnglishOxford University Research Archive - Valet2007Konis, KRipley, BThis thesis is a study of the detection of separation among the sample points in binary logistic regression models. We propose a new algorithm for detecting separation and demonstrate empirically that it can be computed fast enough to be used routinely as part of the fitting process for logistic regression models. The parameter estimates of a binary logistic regression model fit using the method of maximum likelihood sometimes do not converge to finite values. This phenomenon (also known as monotone likelihood or infinite parameters) occurs because of a condition among the sample points known as separation. There are two classes of separation. When complete separation is present among the sample points, iterative procedures for maximizing the likelihood tend to break down, when it would be clear that there is a problem with the model. However, when quasicomplete separation is present among the sample points, the iterative procedures for maximizing the likelihood tend to satisfy their convergence criterion before revealing any indication of separation. The new algorithm is based on a linear program with a nonnegative objective function that has a positive optimal value when separation is present among the sample points. We compare several approaches for solving this linear program and find that a method based on determining the feasibility of the dual to this linear program provides a numerically reliable test for separation among the sample points. A simulation study shows that this test can be computed in a similar amount of time as fitting the binary logistic regression model using the method of iteratively reweighted least squares: hence the test is fast enough to be used routinely as part of the fitting procedure. An implementation of our algorithm (as well as the other methods described in this thesis) is available in the R package safeBinaryRegression.
spellingShingle	Statistics (see also social sciences) Computationally-intensive statistics Konis, K Linear programming algorithms for detecting separated data in binary logistic regression models
title	Linear programming algorithms for detecting separated data in binary logistic regression models
title_full	Linear programming algorithms for detecting separated data in binary logistic regression models
title_fullStr	Linear programming algorithms for detecting separated data in binary logistic regression models
title_full_unstemmed	Linear programming algorithms for detecting separated data in binary logistic regression models
title_short	Linear programming algorithms for detecting separated data in binary logistic regression models
title_sort	linear programming algorithms for detecting separated data in binary logistic regression models
topic	Statistics (see also social sciences) Computationally-intensive statistics
work_keys_str_mv	AT konisk linearprogrammingalgorithmsfordetectingseparateddatainbinarylogisticregressionmodels

Linear programming algorithms for detecting separated data in binary logistic regression models

Eitemau Tebyg