A bias-variance trade-off in the prediction error estimation behavior in bootstrap methods for microarray leukemia classification

Background & Aim: The bootstrap is a method that resample from the original data set. There are the wide ranges of bootstrap application for estimating the prediction error rate. We compare some bootstrap methods for estimating prediction error in classification and choose the best method for th...

Full description

Bibliographic Details
Main Authors: Reza Ali Mohammadpour, Mousa Golalizadeh, Leila Moharrami
Format: Article
Language:English
Published: Tehran University of Medical Sciences 2018-12-01
Series:Journal of Biostatistics and Epidemiology
Subjects:
Online Access:https://jbe.tums.ac.ir/index.php/jbe/article/view/188
Description
Summary:Background & Aim: The bootstrap is a method that resample from the original data set. There are the wide ranges of bootstrap application for estimating the prediction error rate. We compare some bootstrap methods for estimating prediction error in classification and choose the best method for the microarray leukemia classification. Methods & Materials: The sample consist of n=38 patients with acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML) with p=4120 genes that n<<p from an existing database. We carried out following steps. (1) Resample from the original sample. (2) Divide the sample to two sets, learning set and test set by 3-fold cross validation. (3) Train 1NN, CART and DLDA classifiers and compute its misclassification error by comparing the predicted class of the remaining samples with the true class. (4) Average the errors on B bootstrap samples. Results: Standard deviation, bias and MSE for comparing four bootstrap methods by three classifiers were computed. For choosing the best method, we assess a bias-variance tradeoff in the behavior of prediction error estimates. The 0.632+ BT is approximately un-bias and has small variability. However, the LOOBT procedure has big variability and is biased. Also we provide a table and some figures in the section 4. Conclusion: The bias and variance of the prediction error rates have high variability in various bootstrap methods. Although the 0.632+ BT is approximately un-bias and has small variability, other resampling methods maybe are useful for the microarray classification in the different situations.
ISSN:2383-4196
2383-420X