A bias-variance trade-off in the prediction error estimation behavior in bootstrap methods for microarray leukemia classification
Background & Aim: The bootstrap is a method that resample from the original data set. There are the wide ranges of bootstrap application for estimating the prediction error rate. We compare some bootstrap methods for estimating prediction error in classification and choose the best method for th...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Tehran University of Medical Sciences
2018-12-01
|
Series: | Journal of Biostatistics and Epidemiology |
Subjects: | |
Online Access: | https://jbe.tums.ac.ir/index.php/jbe/article/view/188 |
Summary: | Background & Aim: The bootstrap is a method that resample from the original data set. There are the wide ranges of bootstrap application for estimating the prediction error rate. We compare some bootstrap methods for estimating prediction error in classification and choose the best method for the microarray leukemia classification.
Methods & Materials: The sample consist of n=38 patients with acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML) with p=4120 genes that n<<p from an existing database. We carried out following steps. (1) Resample from the original sample. (2) Divide the sample to two sets, learning set and test set by 3-fold cross validation. (3) Train 1NN, CART and DLDA classifiers and compute its misclassification error by comparing the predicted class of the remaining samples with the true class. (4) Average the
errors on B bootstrap samples.
Results: Standard deviation, bias and MSE for comparing four bootstrap methods by three classifiers were computed. For choosing the best method, we assess a bias-variance tradeoff in the behavior of prediction error estimates. The 0.632+ BT is approximately un-bias and has small variability. However, the LOOBT procedure has big variability and is biased. Also we provide a table and some figures in the section 4.
Conclusion: The bias and variance of the prediction error rates have high variability in various bootstrap methods. Although the 0.632+ BT is approximately un-bias and has small variability, other resampling methods maybe are useful for the microarray classification in the different situations. |
---|---|
ISSN: | 2383-4196 2383-420X |