Large-scale consensus clustering and data ownership considerations for medical applications
Thesis: S.M. in Technology and Policy, Massachusetts Institute of Technology, Engineering Systems Division, Technology and Policy Program, 2013.
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Language: | eng |
Published: |
Massachusetts Institute of Technology
2014
|
Subjects: | |
Online Access: | http://hdl.handle.net/1721.1/86273 |
_version_ | 1826208492501860352 |
---|---|
author | Ezeozue, Chidube Donald |
author2 | Una-May O'Reilly and Kalyan Veeramachaneni. |
author_facet | Una-May O'Reilly and Kalyan Veeramachaneni. Ezeozue, Chidube Donald |
author_sort | Ezeozue, Chidube Donald |
collection | MIT |
description | Thesis: S.M. in Technology and Policy, Massachusetts Institute of Technology, Engineering Systems Division, Technology and Policy Program, 2013. |
first_indexed | 2024-09-23T14:06:31Z |
format | Thesis |
id | mit-1721.1/86273 |
institution | Massachusetts Institute of Technology |
language | eng |
last_indexed | 2024-09-23T14:06:31Z |
publishDate | 2014 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/862732019-04-12T21:40:18Z Large-scale consensus clustering and data ownership considerations for medical applications Ezeozue, Chidube Donald Una-May O'Reilly and Kalyan Veeramachaneni. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Engineering Systems Division. Massachusetts Institute of Technology. Technology and Policy Program. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Engineering Systems Division. Technology and Policy Program. Electrical Engineering and Computer Science. Thesis: S.M. in Technology and Policy, Massachusetts Institute of Technology, Engineering Systems Division, Technology and Policy Program, 2013. Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2013. Cataloged from PDF version of thesis. Includes bibliographical references (pages 97-101). An intersection of events has led to a massive increase in the amount of medical data being collected from patients inside and outside the hospital. These events include the development of new sensors, the continuous decrease in the cost of data storage, the development of Big Data algorithms in other domains and the Health Information Technology for Economic and Clinical Health (HITECH) Act's $20 billion incentive for hospitals to install and use Electronic Health Record (EHR) systems. The data being collected presents an excellent opportunity to improve patient care. However, this opportunity is not without its challenges. Some of the challenges are technical in nature, not the least of which is how to efficiently process such massive amounts of data. At the other end of the spectrum, there are policy questions that deal with data privacy, confidentiality and ownership to ensure that research continues unhindered while preserving the rights and interests of the stakeholders involved. This thesis addresses both ends of the challenge spectrum. First of all, we design and implement a number of methods for automatically discovering groups within large amounts of data, otherwise known as clustering. We believe this technique would prove particularly useful in identifying patient states, segregating cohorts of patients and hypothesis generation. Specifically, we scale a popular clustering algorithm, Expectation-Maximization (EM) for Gaussian Mixture Models to be able to run on a cloud of computers. We also give a lot of attention to the idea of Consensus Clustering which allows multiple clusterings to be merged into a single ensemble clustering. Here, we scale one existing consensus clustering algorithm, which relies on EM for multinomial mixture models. We also develop and implement a more general framework for retrofitting any consensus clustering algorithm and making it amenable to streaming data as well as distribution on a cloud. On the policy end of the spectrum, we argue that the issue of data ownership is essential and highlight how the law in the United States has handled this issue in the past several decades, focusing on common law and state law approaches. We proceed to identify the flaws, especially the fragmentation, in the current system and make recommendations for a more equitable and efficient policy stance. The recommendations center on codifying the policy stance in Federal Law and allocating the property rights of the data to both the healthcare provider and the patient. by Chidube Donald Ezeozue. S.M. in Technology and Policy S.M. 2014-04-25T15:48:12Z 2014-04-25T15:48:12Z 2013 2013 Thesis http://hdl.handle.net/1721.1/86273 874576898 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 101 pages application/pdf Massachusetts Institute of Technology |
spellingShingle | Engineering Systems Division. Technology and Policy Program. Electrical Engineering and Computer Science. Ezeozue, Chidube Donald Large-scale consensus clustering and data ownership considerations for medical applications |
title | Large-scale consensus clustering and data ownership considerations for medical applications |
title_full | Large-scale consensus clustering and data ownership considerations for medical applications |
title_fullStr | Large-scale consensus clustering and data ownership considerations for medical applications |
title_full_unstemmed | Large-scale consensus clustering and data ownership considerations for medical applications |
title_short | Large-scale consensus clustering and data ownership considerations for medical applications |
title_sort | large scale consensus clustering and data ownership considerations for medical applications |
topic | Engineering Systems Division. Technology and Policy Program. Electrical Engineering and Computer Science. |
url | http://hdl.handle.net/1721.1/86273 |
work_keys_str_mv | AT ezeozuechidubedonald largescaleconsensusclusteringanddataownershipconsiderationsformedicalapplications |