Large-scale consensus clustering and data ownership considerations for medical applications

Thesis: S.M. in Technology and Policy, Massachusetts Institute of Technology, Engineering Systems Division, Technology and Policy Program, 2013.

Bibliographic Details
Main Author: Ezeozue, Chidube Donald
Other Authors: Una-May O'Reilly and Kalyan Veeramachaneni.
Format: Thesis
Language:eng
Published: Massachusetts Institute of Technology 2014
Subjects:
Online Access:http://hdl.handle.net/1721.1/86273
_version_ 1826208492501860352
author Ezeozue, Chidube Donald
author2 Una-May O'Reilly and Kalyan Veeramachaneni.
author_facet Una-May O'Reilly and Kalyan Veeramachaneni.
Ezeozue, Chidube Donald
author_sort Ezeozue, Chidube Donald
collection MIT
description Thesis: S.M. in Technology and Policy, Massachusetts Institute of Technology, Engineering Systems Division, Technology and Policy Program, 2013.
first_indexed 2024-09-23T14:06:31Z
format Thesis
id mit-1721.1/86273
institution Massachusetts Institute of Technology
language eng
last_indexed 2024-09-23T14:06:31Z
publishDate 2014
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/862732019-04-12T21:40:18Z Large-scale consensus clustering and data ownership considerations for medical applications Ezeozue, Chidube Donald Una-May O'Reilly and Kalyan Veeramachaneni. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Engineering Systems Division. Massachusetts Institute of Technology. Technology and Policy Program. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Engineering Systems Division. Technology and Policy Program. Electrical Engineering and Computer Science. Thesis: S.M. in Technology and Policy, Massachusetts Institute of Technology, Engineering Systems Division, Technology and Policy Program, 2013. Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2013. Cataloged from PDF version of thesis. Includes bibliographical references (pages 97-101). An intersection of events has led to a massive increase in the amount of medical data being collected from patients inside and outside the hospital. These events include the development of new sensors, the continuous decrease in the cost of data storage, the development of Big Data algorithms in other domains and the Health Information Technology for Economic and Clinical Health (HITECH) Act's $20 billion incentive for hospitals to install and use Electronic Health Record (EHR) systems. The data being collected presents an excellent opportunity to improve patient care. However, this opportunity is not without its challenges. Some of the challenges are technical in nature, not the least of which is how to efficiently process such massive amounts of data. At the other end of the spectrum, there are policy questions that deal with data privacy, confidentiality and ownership to ensure that research continues unhindered while preserving the rights and interests of the stakeholders involved. This thesis addresses both ends of the challenge spectrum. First of all, we design and implement a number of methods for automatically discovering groups within large amounts of data, otherwise known as clustering. We believe this technique would prove particularly useful in identifying patient states, segregating cohorts of patients and hypothesis generation. Specifically, we scale a popular clustering algorithm, Expectation-Maximization (EM) for Gaussian Mixture Models to be able to run on a cloud of computers. We also give a lot of attention to the idea of Consensus Clustering which allows multiple clusterings to be merged into a single ensemble clustering. Here, we scale one existing consensus clustering algorithm, which relies on EM for multinomial mixture models. We also develop and implement a more general framework for retrofitting any consensus clustering algorithm and making it amenable to streaming data as well as distribution on a cloud. On the policy end of the spectrum, we argue that the issue of data ownership is essential and highlight how the law in the United States has handled this issue in the past several decades, focusing on common law and state law approaches. We proceed to identify the flaws, especially the fragmentation, in the current system and make recommendations for a more equitable and efficient policy stance. The recommendations center on codifying the policy stance in Federal Law and allocating the property rights of the data to both the healthcare provider and the patient. by Chidube Donald Ezeozue. S.M. in Technology and Policy S.M. 2014-04-25T15:48:12Z 2014-04-25T15:48:12Z 2013 2013 Thesis http://hdl.handle.net/1721.1/86273 874576898 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 101 pages application/pdf Massachusetts Institute of Technology
spellingShingle Engineering Systems Division.
Technology and Policy Program.
Electrical Engineering and Computer Science.
Ezeozue, Chidube Donald
Large-scale consensus clustering and data ownership considerations for medical applications
title Large-scale consensus clustering and data ownership considerations for medical applications
title_full Large-scale consensus clustering and data ownership considerations for medical applications
title_fullStr Large-scale consensus clustering and data ownership considerations for medical applications
title_full_unstemmed Large-scale consensus clustering and data ownership considerations for medical applications
title_short Large-scale consensus clustering and data ownership considerations for medical applications
title_sort large scale consensus clustering and data ownership considerations for medical applications
topic Engineering Systems Division.
Technology and Policy Program.
Electrical Engineering and Computer Science.
url http://hdl.handle.net/1721.1/86273
work_keys_str_mv AT ezeozuechidubedonald largescaleconsensusclusteringanddataownershipconsiderationsformedicalapplications