Optimal classification trees

State-of-the-art decision tree methods apply heuristics recursively to create each split in isolation, which may not capture well the underlying characteristics of the dataset. The optimal decision tree problem attempts to resolve this by creating the entire decision tree at once to achieve global o...

Full description

Bibliographic Details
Main Authors:	Bertsimas, Dimitris, Dunn, Jack William
Other Authors:	Massachusetts Institute of Technology. Operations Research Center
Format:	Article
Language:	English
Published:	Springer US 2017
Online Access:	http://hdl.handle.net/1721.1/110328 https://orcid.org/0000-0002-6936-4502

_version_	1810996231407665152
author	Bertsimas, Dimitris Dunn, Jack William
author2	Massachusetts Institute of Technology. Operations Research Center
author_facet	Massachusetts Institute of Technology. Operations Research Center Bertsimas, Dimitris Dunn, Jack William
author_sort	Bertsimas, Dimitris
collection	MIT
description	State-of-the-art decision tree methods apply heuristics recursively to create each split in isolation, which may not capture well the underlying characteristics of the dataset. The optimal decision tree problem attempts to resolve this by creating the entire decision tree at once to achieve global optimality. In the last 25 years, algorithmic advances in integer optimization coupled with hardware improvements have resulted in an astonishing 800 billion factor speedup in mixed-integer optimization (MIO). Motivated by this speedup, we present optimal classification trees, a novel formulation of the decision tree problem using modern MIO techniques that yields the optimal decision tree for axes-aligned splits. We also show the richness of this MIO formulation by adapting it to give optimal classification trees with hyperplanes that generates optimal decision trees with multivariate splits. Synthetic tests demonstrate that these methods recover the true decision tree more closely than heuristics, refuting the notion that optimal methods overfit the training data. We comprehensively benchmark these methods on a sample of 53 datasets from the UCI machine learning repository. We establish that these MIO methods are practically solvable on real-world datasets with sizes in the 1000s, and give average absolute improvements in out-of-sample accuracy over CART of 1–2 and 3–5% for the univariate and multivariate cases, respectively. Furthermore, we identify that optimal classification trees are likely to outperform CART by 1.2–1.3% in situations where the CART accuracy is high and we have sufficient training data, while the multivariate version outperforms CART by 4–7% when the CART accuracy or dimension of the dataset is low.
first_indexed	2024-09-23T14:09:52Z
format	Article
id	mit-1721.1/110328
institution	Massachusetts Institute of Technology
language	English
last_indexed	2024-09-23T14:09:52Z
publishDate	2017
publisher	Springer US
record_format	dspace
spelling	mit-1721.1/1103282022-09-22T10:57:50Z Optimal classification trees Bertsimas, Dimitris Dunn, Jack William Massachusetts Institute of Technology. Operations Research Center Dunn, Jack William State-of-the-art decision tree methods apply heuristics recursively to create each split in isolation, which may not capture well the underlying characteristics of the dataset. The optimal decision tree problem attempts to resolve this by creating the entire decision tree at once to achieve global optimality. In the last 25 years, algorithmic advances in integer optimization coupled with hardware improvements have resulted in an astonishing 800 billion factor speedup in mixed-integer optimization (MIO). Motivated by this speedup, we present optimal classification trees, a novel formulation of the decision tree problem using modern MIO techniques that yields the optimal decision tree for axes-aligned splits. We also show the richness of this MIO formulation by adapting it to give optimal classification trees with hyperplanes that generates optimal decision trees with multivariate splits. Synthetic tests demonstrate that these methods recover the true decision tree more closely than heuristics, refuting the notion that optimal methods overfit the training data. We comprehensively benchmark these methods on a sample of 53 datasets from the UCI machine learning repository. We establish that these MIO methods are practically solvable on real-world datasets with sizes in the 1000s, and give average absolute improvements in out-of-sample accuracy over CART of 1–2 and 3–5% for the univariate and multivariate cases, respectively. Furthermore, we identify that optimal classification trees are likely to outperform CART by 1.2–1.3% in situations where the CART accuracy is high and we have sufficient training data, while the multivariate version outperforms CART by 4–7% when the CART accuracy or dimension of the dataset is low. 2017-06-27T18:54:24Z 2018-02-04T06:00:05Z 2017-04 2017-06-23T03:51:27Z Article http://purl.org/eprint/type/JournalArticle 0885-6125 1573-0565 http://hdl.handle.net/1721.1/110328 Bertsimas, Dimitris, and Jack Dunn. “Optimal Classification Trees.” Machine Learning 106.7 (2017): 1039–1082. https://orcid.org/0000-0002-6936-4502 en http://dx.doi.org/10.1007/s10994-017-5633-9 Machine Learning Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ The Author(s) application/pdf Springer US Springer US
spellingShingle	Bertsimas, Dimitris Dunn, Jack William Optimal classification trees
title	Optimal classification trees
title_full	Optimal classification trees
title_fullStr	Optimal classification trees
title_full_unstemmed	Optimal classification trees
title_short	Optimal classification trees
title_sort	optimal classification trees
url	http://hdl.handle.net/1721.1/110328 https://orcid.org/0000-0002-6936-4502
work_keys_str_mv	AT bertsimasdimitris optimalclassificationtrees AT dunnjackwilliam optimalclassificationtrees

Optimal classification trees

Similar Items