Summary: | Machine learning is usually associated with big data; however, experimental or clinical data are usually limited in size. The aim of this study was to describe how supervised machine learning can be used to classify astrocytes from a small sample into different morphological classes. Our dataset was composed of only 193 cells, with unbalanced morphological classes and missing observations. We combined classification trees and ensemble algorithms (boosting and bagging) with under sampling to classify the nuclear morphology (homogeneous, dotted, wrinkled, forming crumples, and forming micronuclei) of astrocytes stained with anti-LMNB1 antibody. Accuracy, sensitivity (recall), specificity, and F1 score were assessed with bootstrapping, leave one-out (LOOCV) and stratified cross-validation. We found that our algorithm performed at rates above chance in predicting the morphological classes of astrocytes based on the nuclear expression of LMNB1. Boosting algorithms (tree ensemble) yielded better classifications over bagging ones (tree bagger). Moreover leave-one-out and bootstrapping yielded better predictions than the more commonly used k-fold cross-validation. Finally, we could identify four important predictors: the intensity of LMNB1 expression, nuclear area, cellular area, and soma area. Our results show that a tree ensemble can be optimized, in order to classify morphological data from a small sample, even in the presence of highly unbalanced classes and numerous missing data.
|