Improving clinical risk-stratification tools : instance-transfer for selecting relevant training data
Thesis: S.M. in Computer Science and Engineering, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014.
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Language: | eng |
Published: |
Massachusetts Institute of Technology
2014
|
Subjects: | |
Online Access: | http://hdl.handle.net/1721.1/91090 |
_version_ | 1811090420826898432 |
---|---|
author | Gong, Jen J. (Jen Jian) |
author2 | John V. Guttag. |
author_facet | John V. Guttag. Gong, Jen J. (Jen Jian) |
author_sort | Gong, Jen J. (Jen Jian) |
collection | MIT |
description | Thesis: S.M. in Computer Science and Engineering, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014. |
first_indexed | 2024-09-23T14:45:22Z |
format | Thesis |
id | mit-1721.1/91090 |
institution | Massachusetts Institute of Technology |
language | eng |
last_indexed | 2024-09-23T14:45:22Z |
publishDate | 2014 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/910902019-04-10T21:57:06Z Improving clinical risk-stratification tools : instance-transfer for selecting relevant training data Instance-transfer for selecting relevant training data Gong, Jen J. (Jen Jian) John V. Guttag. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis: S.M. in Computer Science and Engineering, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014. 52 Cataloged from PDF version of thesis. Includes bibliographical references (pages 66-71). One of the primary problems in constructing risk-stratification models for medical applications is that the data are often noisy, incomplete, and suffer from high class-imbalance. This problem becomes more severe when the total amount of data relevant to the task of interest is small. We address this problem in the context of risk-stratifying patients receiving isolated surgical aortic valve replacements (isolated AVR) for the adverse outcomes of operative mortality and stroke. We work with data from two hospitals (Hospital 1 and Hospital 2) in the Society of Thoracic Surgeons (STS) Adult Cardiac Surgery Database. Because the data available for our application of interest (target data) are limited, developing an accurate model using only these data is infeasible. Instead, we investigate transfer learning approaches to utilize data from other cardiac surgery procedures as well as from other institutions (source data). We first evaluate the effectiveness of leveraging information across procedures within a single hospital. We achieve significant improvements over baseline: at Hospital 1, the average AUC for operative mortality increased from 0.58 to 0.70. However, not all source examples are equally useful. Next, we evaluate the effectiveness of leveraging data across hospitals. We show that leveraging information across hospitals has variable utility; although it can result in worse performance (average AUC for stroke at Hospital 1 dropped from 0.61 to 0.56), it can also lead to significant improvements (average AUC for operative mortality at Hospital 1 increased from 0.70 to 0.72). Finally, we present an automated approach to leveraging the available source data. We investigate how removing source data based on how far they are from the mean of the target data affects performance. We propose an instance-weighting scheme based on these distances. This automated instance-weighting approach can achieve small, but significant improvements over using all of the data without weights (average AUC for operative mortality at Hospital 1 increased from 0.72 to 0.73). Research on these methods can have an important impact on the development of clinical risk-stratification tools targeted towards specific patient populations. by Jen J. Gong. S.M. in Computer Science and Engineering 2014-10-21T17:25:32Z 2014-10-21T17:25:32Z 2014 2014 Thesis http://hdl.handle.net/1721.1/91090 892724540 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 71 pages application/pdf Massachusetts Institute of Technology |
spellingShingle | Electrical Engineering and Computer Science. Gong, Jen J. (Jen Jian) Improving clinical risk-stratification tools : instance-transfer for selecting relevant training data |
title | Improving clinical risk-stratification tools : instance-transfer for selecting relevant training data |
title_full | Improving clinical risk-stratification tools : instance-transfer for selecting relevant training data |
title_fullStr | Improving clinical risk-stratification tools : instance-transfer for selecting relevant training data |
title_full_unstemmed | Improving clinical risk-stratification tools : instance-transfer for selecting relevant training data |
title_short | Improving clinical risk-stratification tools : instance-transfer for selecting relevant training data |
title_sort | improving clinical risk stratification tools instance transfer for selecting relevant training data |
topic | Electrical Engineering and Computer Science. |
url | http://hdl.handle.net/1721.1/91090 |
work_keys_str_mv | AT gongjenjjenjian improvingclinicalriskstratificationtoolsinstancetransferforselectingrelevanttrainingdata AT gongjenjjenjian instancetransferforselectingrelevanttrainingdata |