Prognostic models for mesothelioma : variable selection and machine learning
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Language: | eng |
Published: |
Massachusetts Institute of Technology
2006
|
Subjects: | |
Online Access: | http://hdl.handle.net/1721.1/33370 |
_version_ | 1826191591465811968 |
---|---|
author | Vantzelfde, Nathan Hans |
author2 | Lucila Ohno-Machado. |
author_facet | Lucila Ohno-Machado. Vantzelfde, Nathan Hans |
author_sort | Vantzelfde, Nathan Hans |
collection | MIT |
description | Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005. |
first_indexed | 2024-09-23T08:58:18Z |
format | Thesis |
id | mit-1721.1/33370 |
institution | Massachusetts Institute of Technology |
language | eng |
last_indexed | 2024-09-23T08:58:18Z |
publishDate | 2006 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/333702019-04-10T13:59:24Z Prognostic models for mesothelioma : variable selection and machine learning Vantzelfde, Nathan Hans Lucila Ohno-Machado. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005. Includes bibliographical references (leaves 103-107). Malignant pleural mesothelioma is a rare and lethal form of cancer affecting the external lining of the lungs. Extrapleural pneumonectomy (EPP), which involves the removal of the affected lung, is one of the few treatments that has been shown to have some effectiveness in treatment of the disease [39], but this procedure carries with it a high risk of mortality and morbidity [8]. This paper is concerned with building models using gene expression levels to predict patient survival following EPP; these models could potentially be used to guide patient treatment. A study by Gordon et al built a predictor based on ratios of gene expression levels that was 88% accurate on the set of 29 independent test samples, in terms of classifying whether or not the patients survived shorter or longer than the median survival [15]. These results were recreated both on the original data set used by Gordon et al and on a newer data set which contained the same samples but was generated using newer software. The predictors were evaluated using N-fold cross validation. In addition, other methods of variable selection and machine learning were investigated to build different types of predictive models. These analyses used a random training set from the newer data set. These models were evaluated using N-fold cross validation and the best of each of the four main types of models - (cont.) decision trees, logistic regression, artificial neural networks, and support vector machines - were tested using a small set of samples excluded from the training set. Of these four models, the neural network with eight hidden neurons and weight decay regularization performed the best, achieving a zero cross validation error rate and, on the test set, 71% accuracy, an ROC area of .67 and a logrank p value of .219. The support vector machine model with linear kernel also had zero cross validation error and, on the test set, a 71% accuracy and an ROC area of .67 but had a higher logrank p value of .515. These both had a lower cross validation error than the ratio-based predictors of Gordon et al, which had an N-fold cross validation error rate of 35%; however, these results may not be comparable because the neural network and support vector machine used a different training set than the Gordon et al study. Regression analysis was also performed; the best neural network model was incorrect by an average of 4.6 months in the six test samples. The method of variable selection based on the signal-to-noise ratio of genes originally used by Golub et al proved more effective when used on the randomly generated training set than the method involving Student's t tests and fold change used by Gordon et al. Ultimately, however, these models will need to be evaluated using a large independent test. by Nathan Hans Vantzelfde. M.Eng. 2006-07-13T15:18:53Z 2006-07-13T15:18:53Z 2005 2005 Thesis http://hdl.handle.net/1721.1/33370 62521929 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 107 leaves 6106147 bytes 6110573 bytes application/pdf application/pdf application/pdf Massachusetts Institute of Technology |
spellingShingle | Electrical Engineering and Computer Science. Vantzelfde, Nathan Hans Prognostic models for mesothelioma : variable selection and machine learning |
title | Prognostic models for mesothelioma : variable selection and machine learning |
title_full | Prognostic models for mesothelioma : variable selection and machine learning |
title_fullStr | Prognostic models for mesothelioma : variable selection and machine learning |
title_full_unstemmed | Prognostic models for mesothelioma : variable selection and machine learning |
title_short | Prognostic models for mesothelioma : variable selection and machine learning |
title_sort | prognostic models for mesothelioma variable selection and machine learning |
topic | Electrical Engineering and Computer Science. |
url | http://hdl.handle.net/1721.1/33370 |
work_keys_str_mv | AT vantzelfdenathanhans prognosticmodelsformesotheliomavariableselectionandmachinelearning |