Leveraging text representations for clinical predictive tasks

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018.

Bibliographic Details
Main Author:	Naumann, Tristan
Other Authors:	Peter Szolovits.
Format:	Thesis
Language:	eng
Published:	Massachusetts Institute of Technology 2018
Subjects:	Electrical Engineering and Computer Science.
Online Access:	http://hdl.handle.net/1721.1/118090

_version_	1811084194426650624
author	Naumann, Tristan
author2	Peter Szolovits.
author_facet	Peter Szolovits. Naumann, Tristan
author_sort	Naumann, Tristan
collection	MIT
description	Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018.
first_indexed	2024-09-23T12:46:52Z
format	Thesis
id	mit-1721.1/118090
institution	Massachusetts Institute of Technology
language	eng
last_indexed	2024-09-23T12:46:52Z
publishDate	2018
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1180902019-10-09T15:38:09Z Leveraging text representations for clinical predictive tasks Naumann, Tristan Peter Szolovits. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Electrical Engineering and Computer Science. Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018. Cataloged from PDF version of thesis. Includes bibliographical references (pages 93-102). The increasing prevalence of digitized clinical data creates new opportunities to use machine learning to unlock clinical insights, and ultimately improve healthcare delivery. However, while data from Electronic Health Records (EHRs) have become common, they present unique challenges. Clinical data are noisy, sparse, irregularly sampled, and often biased in their recording of health state and care patterns. Further, much of the most important information used by care staff is recorded in unstructured text notes that are not easily deciphered by non-experts. In this work, we present machine learning methods that distill large amounts of text-based clinical data into latent representations. These representations are then used to predict a variety of important outcomes. In particular, we focus on prediction tasks that can provide evidence-based risk assessment and forecasting in settings with guidelines that have not traditionally been data-driven. We consider several abstractions for clinical narrative text, and evaluate their utility on common predictive tasks, such as mortality and readmission. We argue that a "good" representation will improve performance on these tasks and that multiple representations may be necessary, as different models excel on differing tasks. We present three case studies in which we use representations of clinical text to improve performance of clinical prediction tasks. First, we augment predictive models that used baseline clinical features by including features from clinical progress notes [31].These features are derived using Latent Dirichlet Allocation (LDA) and incorporated as features using per-patient topic membership. Notably, this representation has the benefit of interpretable topics over which each patient can be represented as a distribution. Second, we explore the expressive power of clinical prose by evaluating the performance of several common models on both downstream clinical tasks and their ability to identify information contained in patients' notes [7]. This stands in contrast to much prior work that positions the utility of a given model solely with respect to its ability to improve downstream clinical performance. Such extrinsic evaluations are blind to much of the insight contained in the notes, thus motivating the need for intrinsic evaluations. Finally, we use the text-based metadata associated with EHR encodings to allow the transfer of predictive models from one database to another [35]. Existing machine learning methods typically assume consistency in how semantically equivalent information is encoded. However, the way information is recorded differs across institutions and over time, often rendering potentially useful data obsolescent. To address this problem, we map database-specific representations of the information to a shared set of semantic concepts, thus allowing models to be built from or transition across different databases. by Tristan Naumann. Ph. D. 2018-09-17T15:57:05Z 2018-09-17T15:57:05Z 2018 2018 Thesis http://hdl.handle.net/1721.1/118090 1052124071 eng MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582 102 pages application/pdf Massachusetts Institute of Technology
spellingShingle	Electrical Engineering and Computer Science. Naumann, Tristan Leveraging text representations for clinical predictive tasks
title	Leveraging text representations for clinical predictive tasks
title_full	Leveraging text representations for clinical predictive tasks
title_fullStr	Leveraging text representations for clinical predictive tasks
title_full_unstemmed	Leveraging text representations for clinical predictive tasks
title_short	Leveraging text representations for clinical predictive tasks
title_sort	leveraging text representations for clinical predictive tasks
topic	Electrical Engineering and Computer Science.
url	http://hdl.handle.net/1721.1/118090
work_keys_str_mv	AT naumanntristan leveragingtextrepresentationsforclinicalpredictivetasks

Leveraging text representations for clinical predictive tasks

Similar Items