Prognostic models for mesothelioma : variable selection and machine learning

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.

Bibliographic Details
Main Author: Vantzelfde, Nathan Hans
Other Authors: Lucila Ohno-Machado.
Format: Thesis
Language:eng
Published: Massachusetts Institute of Technology 2006
Subjects:
Online Access:http://hdl.handle.net/1721.1/33370
_version_ 1826191591465811968
author Vantzelfde, Nathan Hans
author2 Lucila Ohno-Machado.
author_facet Lucila Ohno-Machado.
Vantzelfde, Nathan Hans
author_sort Vantzelfde, Nathan Hans
collection MIT
description Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.
first_indexed 2024-09-23T08:58:18Z
format Thesis
id mit-1721.1/33370
institution Massachusetts Institute of Technology
language eng
last_indexed 2024-09-23T08:58:18Z
publishDate 2006
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/333702019-04-10T13:59:24Z Prognostic models for mesothelioma : variable selection and machine learning Vantzelfde, Nathan Hans Lucila Ohno-Machado. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005. Includes bibliographical references (leaves 103-107). Malignant pleural mesothelioma is a rare and lethal form of cancer affecting the external lining of the lungs. Extrapleural pneumonectomy (EPP), which involves the removal of the affected lung, is one of the few treatments that has been shown to have some effectiveness in treatment of the disease [39], but this procedure carries with it a high risk of mortality and morbidity [8]. This paper is concerned with building models using gene expression levels to predict patient survival following EPP; these models could potentially be used to guide patient treatment. A study by Gordon et al built a predictor based on ratios of gene expression levels that was 88% accurate on the set of 29 independent test samples, in terms of classifying whether or not the patients survived shorter or longer than the median survival [15]. These results were recreated both on the original data set used by Gordon et al and on a newer data set which contained the same samples but was generated using newer software. The predictors were evaluated using N-fold cross validation. In addition, other methods of variable selection and machine learning were investigated to build different types of predictive models. These analyses used a random training set from the newer data set. These models were evaluated using N-fold cross validation and the best of each of the four main types of models - (cont.) decision trees, logistic regression, artificial neural networks, and support vector machines - were tested using a small set of samples excluded from the training set. Of these four models, the neural network with eight hidden neurons and weight decay regularization performed the best, achieving a zero cross validation error rate and, on the test set, 71% accuracy, an ROC area of .67 and a logrank p value of .219. The support vector machine model with linear kernel also had zero cross validation error and, on the test set, a 71% accuracy and an ROC area of .67 but had a higher logrank p value of .515. These both had a lower cross validation error than the ratio-based predictors of Gordon et al, which had an N-fold cross validation error rate of 35%; however, these results may not be comparable because the neural network and support vector machine used a different training set than the Gordon et al study. Regression analysis was also performed; the best neural network model was incorrect by an average of 4.6 months in the six test samples. The method of variable selection based on the signal-to-noise ratio of genes originally used by Golub et al proved more effective when used on the randomly generated training set than the method involving Student's t tests and fold change used by Gordon et al. Ultimately, however, these models will need to be evaluated using a large independent test. by Nathan Hans Vantzelfde. M.Eng. 2006-07-13T15:18:53Z 2006-07-13T15:18:53Z 2005 2005 Thesis http://hdl.handle.net/1721.1/33370 62521929 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 107 leaves 6106147 bytes 6110573 bytes application/pdf application/pdf application/pdf Massachusetts Institute of Technology
spellingShingle Electrical Engineering and Computer Science.
Vantzelfde, Nathan Hans
Prognostic models for mesothelioma : variable selection and machine learning
title Prognostic models for mesothelioma : variable selection and machine learning
title_full Prognostic models for mesothelioma : variable selection and machine learning
title_fullStr Prognostic models for mesothelioma : variable selection and machine learning
title_full_unstemmed Prognostic models for mesothelioma : variable selection and machine learning
title_short Prognostic models for mesothelioma : variable selection and machine learning
title_sort prognostic models for mesothelioma variable selection and machine learning
topic Electrical Engineering and Computer Science.
url http://hdl.handle.net/1721.1/33370
work_keys_str_mv AT vantzelfdenathanhans prognosticmodelsformesotheliomavariableselectionandmachinelearning