Groundtruth budgeting : a novel approach to semi-supervised relation extraction in medical language

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.

Bibliographic Details
Main Author: Ryan, Russell J. (Russell John Wyatt)
Other Authors: Özlem Uzuner and Peter Szolovits.
Format: Thesis
Language:eng
Published: Massachusetts Institute of Technology 2011
Subjects:
Online Access:http://hdl.handle.net/1721.1/66456
_version_ 1826207166932975616
author Ryan, Russell J. (Russell John Wyatt)
author2 Özlem Uzuner and Peter Szolovits.
author_facet Özlem Uzuner and Peter Szolovits.
Ryan, Russell J. (Russell John Wyatt)
author_sort Ryan, Russell J. (Russell John Wyatt)
collection MIT
description Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.
first_indexed 2024-09-23T13:45:05Z
format Thesis
id mit-1721.1/66456
institution Massachusetts Institute of Technology
language eng
last_indexed 2024-09-23T13:45:05Z
publishDate 2011
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/664562019-04-10T18:00:39Z Groundtruth budgeting : a novel approach to semi-supervised relation extraction in medical language Ground truth budgeting Novel approach to semi-supervised relation extraction in medical language Ryan, Russell J. (Russell John Wyatt) Özlem Uzuner and Peter Szolovits. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011. Cataloged from PDF version of thesis. Includes bibliographical references (p. 67-69). We address the problem of weakly-supervised relation extraction in hospital discharge summaries. Sentences with pre-identified concept types (for example: medication, test, problem, symptom) are labeled with the relationship between the concepts. We present a novel technique for weakly-supervised bootstrapping of a classifier for this task: Groundtruth Budgeting. In the case of highly-overlapping, self-similar datasets as is the case with the 2010 i2b2/VA challenge corpus, the performance of classifiers on the minority classes is often poor. To address this we set aside a random portion of the groundtruth at the beginning of bootstrapping which will be gradually added as the classifier is bootstrapped. The classifier chooses groundtruth samples to be added by measuring the confidence of its predictions on them and choosing samples for which it has the least confident predictions. By adding samples in this fashion, the classifier is able to increase its coverage of the decision space while not adding too many majority-class examples. We evaluate this approach on the 2010 i2b2/VA challenge corpus containing of 477 patient discharge summaries and show that with a training corpus of 349 discharge summaries, budgeting 10% of the corpus achieves equivalent results to a bootstrapping classifier starting with the entire corpus. We compare our results to those of other papers published in the proceedings of the 2010 Fourth i2b2/VA Shared-Task and Workshop. by Russell J. Ryan. M.Eng. 2011-10-17T21:28:01Z 2011-10-17T21:28:01Z 2011 2011 Thesis http://hdl.handle.net/1721.1/66456 756040752 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 69 p. application/pdf Massachusetts Institute of Technology
spellingShingle Electrical Engineering and Computer Science.
Ryan, Russell J. (Russell John Wyatt)
Groundtruth budgeting : a novel approach to semi-supervised relation extraction in medical language
title Groundtruth budgeting : a novel approach to semi-supervised relation extraction in medical language
title_full Groundtruth budgeting : a novel approach to semi-supervised relation extraction in medical language
title_fullStr Groundtruth budgeting : a novel approach to semi-supervised relation extraction in medical language
title_full_unstemmed Groundtruth budgeting : a novel approach to semi-supervised relation extraction in medical language
title_short Groundtruth budgeting : a novel approach to semi-supervised relation extraction in medical language
title_sort groundtruth budgeting a novel approach to semi supervised relation extraction in medical language
topic Electrical Engineering and Computer Science.
url http://hdl.handle.net/1721.1/66456
work_keys_str_mv AT ryanrusselljrusselljohnwyatt groundtruthbudgetinganovelapproachtosemisupervisedrelationextractioninmedicallanguage
AT ryanrusselljrusselljohnwyatt groundtruthbudgeting
AT ryanrusselljrusselljohnwyatt novelapproachtosemisupervisedrelationextractioninmedicallanguage