A Simulated Annealing Approach to Designing Optimal Decision Trees for Classification, Prescriptive, and Survival Analysis

A binary decision tree is a highly interpretable machine learning model, as humans can easily understand how a prediction is made by answering a series of binary questions. Earlier work has provided a powerful framework for constructing optimal decision trees by utilizing multiple random warm starts...

Full description

Bibliographic Details
Main Author: Sujichantararat, Suleeporn
Other Authors: Bertsimas, Dimitris
Format: Thesis
Published: Massachusetts Institute of Technology 2024
Online Access:https://hdl.handle.net/1721.1/153845
https://orcid.org/0000-0002-6564-9316
Description
Summary:A binary decision tree is a highly interpretable machine learning model, as humans can easily understand how a prediction is made by answering a series of binary questions. Earlier work has provided a powerful framework for constructing optimal decision trees by utilizing multiple random warm starts and local search to iteratively improve each warm start until a locally optimal decision tree is found. However, local search does not guarantee global optimality if the number of random warm starts does not exceed the number of local minima. Hence, we propose to incorporate simulated annealing into decision tree construction, as some worse transformations could lead to a better final model. We focus on three problem domains: classification, prescriptive and survival analysis to produce Optimal Classification Trees with Simulated Annealing (OCT-SA), Optimal Policy Tree with Simulated Annealing (OPT-SA), and Optimal Survival Tree with Simulated Annealing (OST-SA). This approach further improves on OCT, OPT, and OST by probabilistically allowing a tree to move to a tree with a worse objective value. A challenge in designing OCT-SA, OPT-SA, and OST-SA is to define an appropriate simulated annealing cooling schedule that leads to a global optimal solution in practical runtime. We evaluate OCT-SA, OPT-SA, and OST-SA on widely-used benchmarking real-world datasets. The evaluation metrics are out-of-sample performances over unseen test datasets: Gini impurity scores for classification, mean rewards scores for prescriptive, and local full likelihood score for survival analysis. The results demonstrate that our simulated annealing framework benefits the decision trees construction process.