Prognostic models for mesothelioma : variable selection and machine learning

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.

Bibliographic Details
Main Author:	Vantzelfde, Nathan Hans
Other Authors:	Lucila Ohno-Machado.
Format:	Thesis
Language:	eng
Published:	Massachusetts Institute of Technology 2006
Subjects:	Electrical Engineering and Computer Science.
Online Access:	http://hdl.handle.net/1721.1/33370

_version_	1826191591465811968
author	Vantzelfde, Nathan Hans
author2	Lucila Ohno-Machado.
author_facet	Lucila Ohno-Machado. Vantzelfde, Nathan Hans
author_sort	Vantzelfde, Nathan Hans
collection	MIT
description	Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.
first_indexed	2024-09-23T08:58:18Z
format	Thesis
id	mit-1721.1/33370
institution	Massachusetts Institute of Technology
language	eng
last_indexed	2024-09-23T08:58:18Z
publishDate	2006
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/333702019-04-10T13:59:24Z Prognostic models for mesothelioma : variable selection and machine learning Vantzelfde, Nathan Hans Lucila Ohno-Machado. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005. Includes bibliographical references (leaves 103-107). Malignant pleural mesothelioma is a rare and lethal form of cancer affecting the external lining of the lungs. Extrapleural pneumonectomy (EPP), which involves the removal of the affected lung, is one of the few treatments that has been shown to have some effectiveness in treatment of the disease [39], but this procedure carries with it a high risk of mortality and morbidity [8]. This paper is concerned with building models using gene expression levels to predict patient survival following EPP; these models could potentially be used to guide patient treatment. A study by Gordon et al built a predictor based on ratios of gene expression levels that was 88% accurate on the set of 29 independent test samples, in terms of classifying whether or not the patients survived shorter or longer than the median survival [15]. These results were recreated both on the original data set used by Gordon et al and on a newer data set which contained the same samples but was generated using newer software. The predictors were evaluated using N-fold cross validation. In addition, other methods of variable selection and machine learning were investigated to build different types of predictive models. These analyses used a random training set from the newer data set. These models were evaluated using N-fold cross validation and the best of each of the four main types of models - (cont.) decision trees, logistic regression, artificial neural networks, and support vector machines - were tested using a small set of samples excluded from the training set. Of these four models, the neural network with eight hidden neurons and weight decay regularization performed the best, achieving a zero cross validation error rate and, on the test set, 71% accuracy, an ROC area of .67 and a logrank p value of .219. The support vector machine model with linear kernel also had zero cross validation error and, on the test set, a 71% accuracy and an ROC area of .67 but had a higher logrank p value of .515. These both had a lower cross validation error than the ratio-based predictors of Gordon et al, which had an N-fold cross validation error rate of 35%; however, these results may not be comparable because the neural network and support vector machine used a different training set than the Gordon et al study. Regression analysis was also performed; the best neural network model was incorrect by an average of 4.6 months in the six test samples. The method of variable selection based on the signal-to-noise ratio of genes originally used by Golub et al proved more effective when used on the randomly generated training set than the method involving Student's t tests and fold change used by Gordon et al. Ultimately, however, these models will need to be evaluated using a large independent test. by Nathan Hans Vantzelfde. M.Eng. 2006-07-13T15:18:53Z 2006-07-13T15:18:53Z 2005 2005 Thesis http://hdl.handle.net/1721.1/33370 62521929 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 107 leaves 6106147 bytes 6110573 bytes application/pdf application/pdf application/pdf Massachusetts Institute of Technology
spellingShingle	Electrical Engineering and Computer Science. Vantzelfde, Nathan Hans Prognostic models for mesothelioma : variable selection and machine learning
title	Prognostic models for mesothelioma : variable selection and machine learning
title_full	Prognostic models for mesothelioma : variable selection and machine learning
title_fullStr	Prognostic models for mesothelioma : variable selection and machine learning
title_full_unstemmed	Prognostic models for mesothelioma : variable selection and machine learning
title_short	Prognostic models for mesothelioma : variable selection and machine learning
title_sort	prognostic models for mesothelioma variable selection and machine learning
topic	Electrical Engineering and Computer Science.
url	http://hdl.handle.net/1721.1/33370
work_keys_str_mv	AT vantzelfdenathanhans prognosticmodelsformesotheliomavariableselectionandmachinelearning

Prognostic models for mesothelioma : variable selection and machine learning

Similar Items