A mixed-integer exponential cone programming formulation for feature subset selection in logistic regression

Logistic regression is one of the widely-used classification tools to construct prediction models. For datasets with a large number of features, feature subset selection methods are considered to obtain accurate and interpretable prediction models, in which irrelevant and redundant features are remo...

Full description

Bibliographic Details
Main Authors: Sahand Asgharieh Ahari, Burak Kocuk
Format: Article
Language:English
Published: Elsevier 2023-01-01
Series:EURO Journal on Computational Optimization
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2192440623000138
_version_ 1797389489237131264
author Sahand Asgharieh Ahari
Burak Kocuk
author_facet Sahand Asgharieh Ahari
Burak Kocuk
author_sort Sahand Asgharieh Ahari
collection DOAJ
description Logistic regression is one of the widely-used classification tools to construct prediction models. For datasets with a large number of features, feature subset selection methods are considered to obtain accurate and interpretable prediction models, in which irrelevant and redundant features are removed. In this paper, we address the problem of feature subset selection in logistic regression using modern optimization techniques. To this end, we formulate this problem as a mixed-integer exponential cone program (MIEXP). To the best of our knowledge, this is the first time both nonlinear and discrete aspects of the underlying problem are fully considered within an exact optimization framework. We derive different versions of the MIEXP model by the means of regularization and goodness of fit measures including Akaike Information Criterion and Bayesian Information Criterion. Finally, we solve our MIEXP models using the solver MOSEK and evaluate the performance of our different versions over a set of toy examples and benchmark datasets. The results show that our approach is quite successful in obtaining accurate and interpretable prediction models compared to other methods from the literature.
first_indexed 2024-03-08T22:57:44Z
format Article
id doaj.art-9aac3d2d61b243388cb4bc276e5b5220
institution Directory Open Access Journal
issn 2192-4406
language English
last_indexed 2024-03-08T22:57:44Z
publishDate 2023-01-01
publisher Elsevier
record_format Article
series EURO Journal on Computational Optimization
spelling doaj.art-9aac3d2d61b243388cb4bc276e5b52202023-12-16T06:07:00ZengElsevierEURO Journal on Computational Optimization2192-44062023-01-0111100069A mixed-integer exponential cone programming formulation for feature subset selection in logistic regressionSahand Asgharieh Ahari0Burak Kocuk1Faculty of Economics and Business, University of Groningen, Groningen, the NetherlandsIndustrial Engineering Program, Sabancı University, Istanbul, Turkey; Corresponding author.Logistic regression is one of the widely-used classification tools to construct prediction models. For datasets with a large number of features, feature subset selection methods are considered to obtain accurate and interpretable prediction models, in which irrelevant and redundant features are removed. In this paper, we address the problem of feature subset selection in logistic regression using modern optimization techniques. To this end, we formulate this problem as a mixed-integer exponential cone program (MIEXP). To the best of our knowledge, this is the first time both nonlinear and discrete aspects of the underlying problem are fully considered within an exact optimization framework. We derive different versions of the MIEXP model by the means of regularization and goodness of fit measures including Akaike Information Criterion and Bayesian Information Criterion. Finally, we solve our MIEXP models using the solver MOSEK and evaluate the performance of our different versions over a set of toy examples and benchmark datasets. The results show that our approach is quite successful in obtaining accurate and interpretable prediction models compared to other methods from the literature.http://www.sciencedirect.com/science/article/pii/S2192440623000138Mixed-integer exponential cone programmingMachine learningSparse logistic regressionClassification
spellingShingle Sahand Asgharieh Ahari
Burak Kocuk
A mixed-integer exponential cone programming formulation for feature subset selection in logistic regression
EURO Journal on Computational Optimization
Mixed-integer exponential cone programming
Machine learning
Sparse logistic regression
Classification
title A mixed-integer exponential cone programming formulation for feature subset selection in logistic regression
title_full A mixed-integer exponential cone programming formulation for feature subset selection in logistic regression
title_fullStr A mixed-integer exponential cone programming formulation for feature subset selection in logistic regression
title_full_unstemmed A mixed-integer exponential cone programming formulation for feature subset selection in logistic regression
title_short A mixed-integer exponential cone programming formulation for feature subset selection in logistic regression
title_sort mixed integer exponential cone programming formulation for feature subset selection in logistic regression
topic Mixed-integer exponential cone programming
Machine learning
Sparse logistic regression
Classification
url http://www.sciencedirect.com/science/article/pii/S2192440623000138
work_keys_str_mv AT sahandasghariehahari amixedintegerexponentialconeprogrammingformulationforfeaturesubsetselectioninlogisticregression
AT burakkocuk amixedintegerexponentialconeprogrammingformulationforfeaturesubsetselectioninlogisticregression
AT sahandasghariehahari mixedintegerexponentialconeprogrammingformulationforfeaturesubsetselectioninlogisticregression
AT burakkocuk mixedintegerexponentialconeprogrammingformulationforfeaturesubsetselectioninlogisticregression