Knowledge Distillation for Interpretable Clinical Time Series Outcome Prediction

A common machine learning task in healthcare is to predict a patient’s final outcome given their history of vitals and treatments. For example, sepsis is a life-threatening condition that happens when the body has an extreme response to an infection. Treating sepsis is a complicated process, and we...

Full description

Bibliographic Details
Main Author: Wong, Anna
Other Authors: Mark, Roger G.
Format: Thesis
Published: Massachusetts Institute of Technology 2023
Online Access:https://hdl.handle.net/1721.1/151355
_version_ 1826196640615104512
author Wong, Anna
author2 Mark, Roger G.
author_facet Mark, Roger G.
Wong, Anna
author_sort Wong, Anna
collection MIT
description A common machine learning task in healthcare is to predict a patient’s final outcome given their history of vitals and treatments. For example, sepsis is a life-threatening condition that happens when the body has an extreme response to an infection. Treating sepsis is a complicated process, and we are interested in being able to predict a sepsis patient’s final outcome. Neural networks are a powerful model to make accurate predictions on such outcomes, but a major drawback of these models is that they are not interpretable. Being able to accurately predict treatment outcomes while also being able to understand the model’s predictions is necessary for these models and algorithms to be used in the real world. In this thesis, we use knowledge distillation, which is a technique for taking a model with high predictive power (known as the "teacher model"), and using it to train a model that has other desirable traits such as interpretability (known as the "student model"). For our teacher model, we use an LSTM, which is a type of neural network, to predict mortality for sepsis patients, given information about their recent history of vital signs and treatments. For our student model, we use an autoregressive hidden Markov model to learn interpretable hidden states. To incorporate the knowledge from the teacher model into the student model, we use a similarity-based constraint. We evaluate a method from a previous work that uses variational inference to learn the hidden states, and also develop and evaluate an alternative approach that uses the expectation-maximization algorithm. We analyze the interpretability of the learned states. Our results show that, although there is room for improvement in maintaining the generative performance of the model after adding the similarity constraint, the expectation-maximization algorithm is successful in incorporating the constraint to achieve high predictive power similar to the teacher model, along with better interpretability when compared to the teacher model.
first_indexed 2024-09-23T10:31:01Z
format Thesis
id mit-1721.1/151355
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T10:31:01Z
publishDate 2023
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1513552023-08-01T04:03:25Z Knowledge Distillation for Interpretable Clinical Time Series Outcome Prediction Wong, Anna Mark, Roger G. Lehman, Li-wei Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science A common machine learning task in healthcare is to predict a patient’s final outcome given their history of vitals and treatments. For example, sepsis is a life-threatening condition that happens when the body has an extreme response to an infection. Treating sepsis is a complicated process, and we are interested in being able to predict a sepsis patient’s final outcome. Neural networks are a powerful model to make accurate predictions on such outcomes, but a major drawback of these models is that they are not interpretable. Being able to accurately predict treatment outcomes while also being able to understand the model’s predictions is necessary for these models and algorithms to be used in the real world. In this thesis, we use knowledge distillation, which is a technique for taking a model with high predictive power (known as the "teacher model"), and using it to train a model that has other desirable traits such as interpretability (known as the "student model"). For our teacher model, we use an LSTM, which is a type of neural network, to predict mortality for sepsis patients, given information about their recent history of vital signs and treatments. For our student model, we use an autoregressive hidden Markov model to learn interpretable hidden states. To incorporate the knowledge from the teacher model into the student model, we use a similarity-based constraint. We evaluate a method from a previous work that uses variational inference to learn the hidden states, and also develop and evaluate an alternative approach that uses the expectation-maximization algorithm. We analyze the interpretability of the learned states. Our results show that, although there is room for improvement in maintaining the generative performance of the model after adding the similarity constraint, the expectation-maximization algorithm is successful in incorporating the constraint to achieve high predictive power similar to the teacher model, along with better interpretability when compared to the teacher model. M.Eng. 2023-07-31T19:33:40Z 2023-07-31T19:33:40Z 2023-06 2023-06-06T16:35:06.220Z Thesis https://hdl.handle.net/1721.1/151355 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Wong, Anna
Knowledge Distillation for Interpretable Clinical Time Series Outcome Prediction
title Knowledge Distillation for Interpretable Clinical Time Series Outcome Prediction
title_full Knowledge Distillation for Interpretable Clinical Time Series Outcome Prediction
title_fullStr Knowledge Distillation for Interpretable Clinical Time Series Outcome Prediction
title_full_unstemmed Knowledge Distillation for Interpretable Clinical Time Series Outcome Prediction
title_short Knowledge Distillation for Interpretable Clinical Time Series Outcome Prediction
title_sort knowledge distillation for interpretable clinical time series outcome prediction
url https://hdl.handle.net/1721.1/151355
work_keys_str_mv AT wonganna knowledgedistillationforinterpretableclinicaltimeseriesoutcomeprediction