A Simulated Annealing Approach to Designing Optimal Decision Trees for Classification, Prescriptive, and Survival Analysis

A binary decision tree is a highly interpretable machine learning model, as humans can easily understand how a prediction is made by answering a series of binary questions. Earlier work has provided a powerful framework for constructing optimal decision trees by utilizing multiple random warm starts...

Full description

Bibliographic Details
Main Author: Sujichantararat, Suleeporn
Other Authors: Bertsimas, Dimitris
Format: Thesis
Published: Massachusetts Institute of Technology 2024
Online Access:https://hdl.handle.net/1721.1/153845
https://orcid.org/0000-0002-6564-9316
_version_ 1826194367585452032
author Sujichantararat, Suleeporn
author2 Bertsimas, Dimitris
author_facet Bertsimas, Dimitris
Sujichantararat, Suleeporn
author_sort Sujichantararat, Suleeporn
collection MIT
description A binary decision tree is a highly interpretable machine learning model, as humans can easily understand how a prediction is made by answering a series of binary questions. Earlier work has provided a powerful framework for constructing optimal decision trees by utilizing multiple random warm starts and local search to iteratively improve each warm start until a locally optimal decision tree is found. However, local search does not guarantee global optimality if the number of random warm starts does not exceed the number of local minima. Hence, we propose to incorporate simulated annealing into decision tree construction, as some worse transformations could lead to a better final model. We focus on three problem domains: classification, prescriptive and survival analysis to produce Optimal Classification Trees with Simulated Annealing (OCT-SA), Optimal Policy Tree with Simulated Annealing (OPT-SA), and Optimal Survival Tree with Simulated Annealing (OST-SA). This approach further improves on OCT, OPT, and OST by probabilistically allowing a tree to move to a tree with a worse objective value. A challenge in designing OCT-SA, OPT-SA, and OST-SA is to define an appropriate simulated annealing cooling schedule that leads to a global optimal solution in practical runtime. We evaluate OCT-SA, OPT-SA, and OST-SA on widely-used benchmarking real-world datasets. The evaluation metrics are out-of-sample performances over unseen test datasets: Gini impurity scores for classification, mean rewards scores for prescriptive, and local full likelihood score for survival analysis. The results demonstrate that our simulated annealing framework benefits the decision trees construction process.
first_indexed 2024-09-23T09:54:54Z
format Thesis
id mit-1721.1/153845
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T09:54:54Z
publishDate 2024
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1538452024-03-22T03:56:58Z A Simulated Annealing Approach to Designing Optimal Decision Trees for Classification, Prescriptive, and Survival Analysis Sujichantararat, Suleeporn Bertsimas, Dimitris Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science A binary decision tree is a highly interpretable machine learning model, as humans can easily understand how a prediction is made by answering a series of binary questions. Earlier work has provided a powerful framework for constructing optimal decision trees by utilizing multiple random warm starts and local search to iteratively improve each warm start until a locally optimal decision tree is found. However, local search does not guarantee global optimality if the number of random warm starts does not exceed the number of local minima. Hence, we propose to incorporate simulated annealing into decision tree construction, as some worse transformations could lead to a better final model. We focus on three problem domains: classification, prescriptive and survival analysis to produce Optimal Classification Trees with Simulated Annealing (OCT-SA), Optimal Policy Tree with Simulated Annealing (OPT-SA), and Optimal Survival Tree with Simulated Annealing (OST-SA). This approach further improves on OCT, OPT, and OST by probabilistically allowing a tree to move to a tree with a worse objective value. A challenge in designing OCT-SA, OPT-SA, and OST-SA is to define an appropriate simulated annealing cooling schedule that leads to a global optimal solution in practical runtime. We evaluate OCT-SA, OPT-SA, and OST-SA on widely-used benchmarking real-world datasets. The evaluation metrics are out-of-sample performances over unseen test datasets: Gini impurity scores for classification, mean rewards scores for prescriptive, and local full likelihood score for survival analysis. The results demonstrate that our simulated annealing framework benefits the decision trees construction process. Ph.D. 2024-03-21T19:09:59Z 2024-03-21T19:09:59Z 2024-02 2024-02-21T17:19:14.872Z Thesis https://hdl.handle.net/1721.1/153845 https://orcid.org/0000-0002-6564-9316 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Sujichantararat, Suleeporn
A Simulated Annealing Approach to Designing Optimal Decision Trees for Classification, Prescriptive, and Survival Analysis
title A Simulated Annealing Approach to Designing Optimal Decision Trees for Classification, Prescriptive, and Survival Analysis
title_full A Simulated Annealing Approach to Designing Optimal Decision Trees for Classification, Prescriptive, and Survival Analysis
title_fullStr A Simulated Annealing Approach to Designing Optimal Decision Trees for Classification, Prescriptive, and Survival Analysis
title_full_unstemmed A Simulated Annealing Approach to Designing Optimal Decision Trees for Classification, Prescriptive, and Survival Analysis
title_short A Simulated Annealing Approach to Designing Optimal Decision Trees for Classification, Prescriptive, and Survival Analysis
title_sort simulated annealing approach to designing optimal decision trees for classification prescriptive and survival analysis
url https://hdl.handle.net/1721.1/153845
https://orcid.org/0000-0002-6564-9316
work_keys_str_mv AT sujichantararatsuleeporn asimulatedannealingapproachtodesigningoptimaldecisiontreesforclassificationprescriptiveandsurvivalanalysis
AT sujichantararatsuleeporn simulatedannealingapproachtodesigningoptimaldecisiontreesforclassificationprescriptiveandsurvivalanalysis