A mixed-integer exponential cone programming formulation for feature subset selection in logistic regression
Logistic regression is one of the widely-used classification tools to construct prediction models. For datasets with a large number of features, feature subset selection methods are considered to obtain accurate and interpretable prediction models, in which irrelevant and redundant features are remo...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2023-01-01
|
Series: | EURO Journal on Computational Optimization |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2192440623000138 |
_version_ | 1797389489237131264 |
---|---|
author | Sahand Asgharieh Ahari Burak Kocuk |
author_facet | Sahand Asgharieh Ahari Burak Kocuk |
author_sort | Sahand Asgharieh Ahari |
collection | DOAJ |
description | Logistic regression is one of the widely-used classification tools to construct prediction models. For datasets with a large number of features, feature subset selection methods are considered to obtain accurate and interpretable prediction models, in which irrelevant and redundant features are removed. In this paper, we address the problem of feature subset selection in logistic regression using modern optimization techniques. To this end, we formulate this problem as a mixed-integer exponential cone program (MIEXP). To the best of our knowledge, this is the first time both nonlinear and discrete aspects of the underlying problem are fully considered within an exact optimization framework. We derive different versions of the MIEXP model by the means of regularization and goodness of fit measures including Akaike Information Criterion and Bayesian Information Criterion. Finally, we solve our MIEXP models using the solver MOSEK and evaluate the performance of our different versions over a set of toy examples and benchmark datasets. The results show that our approach is quite successful in obtaining accurate and interpretable prediction models compared to other methods from the literature. |
first_indexed | 2024-03-08T22:57:44Z |
format | Article |
id | doaj.art-9aac3d2d61b243388cb4bc276e5b5220 |
institution | Directory Open Access Journal |
issn | 2192-4406 |
language | English |
last_indexed | 2024-03-08T22:57:44Z |
publishDate | 2023-01-01 |
publisher | Elsevier |
record_format | Article |
series | EURO Journal on Computational Optimization |
spelling | doaj.art-9aac3d2d61b243388cb4bc276e5b52202023-12-16T06:07:00ZengElsevierEURO Journal on Computational Optimization2192-44062023-01-0111100069A mixed-integer exponential cone programming formulation for feature subset selection in logistic regressionSahand Asgharieh Ahari0Burak Kocuk1Faculty of Economics and Business, University of Groningen, Groningen, the NetherlandsIndustrial Engineering Program, Sabancı University, Istanbul, Turkey; Corresponding author.Logistic regression is one of the widely-used classification tools to construct prediction models. For datasets with a large number of features, feature subset selection methods are considered to obtain accurate and interpretable prediction models, in which irrelevant and redundant features are removed. In this paper, we address the problem of feature subset selection in logistic regression using modern optimization techniques. To this end, we formulate this problem as a mixed-integer exponential cone program (MIEXP). To the best of our knowledge, this is the first time both nonlinear and discrete aspects of the underlying problem are fully considered within an exact optimization framework. We derive different versions of the MIEXP model by the means of regularization and goodness of fit measures including Akaike Information Criterion and Bayesian Information Criterion. Finally, we solve our MIEXP models using the solver MOSEK and evaluate the performance of our different versions over a set of toy examples and benchmark datasets. The results show that our approach is quite successful in obtaining accurate and interpretable prediction models compared to other methods from the literature.http://www.sciencedirect.com/science/article/pii/S2192440623000138Mixed-integer exponential cone programmingMachine learningSparse logistic regressionClassification |
spellingShingle | Sahand Asgharieh Ahari Burak Kocuk A mixed-integer exponential cone programming formulation for feature subset selection in logistic regression EURO Journal on Computational Optimization Mixed-integer exponential cone programming Machine learning Sparse logistic regression Classification |
title | A mixed-integer exponential cone programming formulation for feature subset selection in logistic regression |
title_full | A mixed-integer exponential cone programming formulation for feature subset selection in logistic regression |
title_fullStr | A mixed-integer exponential cone programming formulation for feature subset selection in logistic regression |
title_full_unstemmed | A mixed-integer exponential cone programming formulation for feature subset selection in logistic regression |
title_short | A mixed-integer exponential cone programming formulation for feature subset selection in logistic regression |
title_sort | mixed integer exponential cone programming formulation for feature subset selection in logistic regression |
topic | Mixed-integer exponential cone programming Machine learning Sparse logistic regression Classification |
url | http://www.sciencedirect.com/science/article/pii/S2192440623000138 |
work_keys_str_mv | AT sahandasghariehahari amixedintegerexponentialconeprogrammingformulationforfeaturesubsetselectioninlogisticregression AT burakkocuk amixedintegerexponentialconeprogrammingformulationforfeaturesubsetselectioninlogisticregression AT sahandasghariehahari mixedintegerexponentialconeprogrammingformulationforfeaturesubsetselectioninlogisticregression AT burakkocuk mixedintegerexponentialconeprogrammingformulationforfeaturesubsetselectioninlogisticregression |