Diagnostic Classification and Biomarker Identification of Alzheimer’s Disease with Random Forest Algorithm

Random Forest (RF) is a bagging ensemble model and has many important advantages, such as robustness to noise, an effective structure for complex multimodal data and parallel computing, and also provides important features that help investigate biomarkers. Despite these benefits, RF is not used acti...

Full description

Bibliographic Details
Main Authors: Minseok Song, Hyeyoom Jung, Seungyong Lee, Donghyeon Kim, Minkyu Ahn
Format: Article
Language:English
Published: MDPI AG 2021-04-01
Series:Brain Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3425/11/4/453
_version_ 1827695979191599104
author Minseok Song
Hyeyoom Jung
Seungyong Lee
Donghyeon Kim
Minkyu Ahn
author_facet Minseok Song
Hyeyoom Jung
Seungyong Lee
Donghyeon Kim
Minkyu Ahn
author_sort Minseok Song
collection DOAJ
description Random Forest (RF) is a bagging ensemble model and has many important advantages, such as robustness to noise, an effective structure for complex multimodal data and parallel computing, and also provides important features that help investigate biomarkers. Despite these benefits, RF is not used actively to predict Alzheimer’s disease (AD) with brain MRIs. Recent studies have reported RF’s effectiveness in predicting AD, but the test sample sizes were too small to draw any solid conclusions. Thus, it is timely to compare RF with other learning model methods, including deep learning, particularly with large amounts of data. In this study, we tested RF and various machine learning models with regional volumes from 2250 brain MRIs: 687 normal controls (NC), 1094 mild cognitive impairment (MCI), and 469 AD that ADNI (Alzheimer’s Disease Neuroimaging Initiative database) provided. Three types of features sets (63, 29, and 22 features) were selected, and classification accuracies were computed with RF, Support vector machine (SVM), Multi-layer perceptron (MLP), and Convolutional neural network (CNN). As a result, RF, MLP, and CNN showed high performances of 90.2%, 89.6%, and 90.5% with 63 features. Interestingly, when 22 features were used, RF showed the smallest decrease in accuracy, −3.8%, and the standard deviation did not change significantly, while MLP and CNN yielded decreases in accuracy of −6.8% and −4.5% with changes in the standard deviation from 3.3% to 4.0% for MLP and 2.1% to 7.0% for CNN, indicating that RF predicts AD more reliably with fewer features. In addition, we investigated the importance of the features that RF provides, and identified the hippocampus, amygdala, and inferior lateral ventricle as the major contributors in classifying NC, MCI, and AD. On average, AD showed smaller hippocampus and amygdala volumes and a larger volume of inferior lateral ventricle than those of MCI and NC.
first_indexed 2024-03-10T12:38:31Z
format Article
id doaj.art-435ee3d6d68a40cea9ca7ba330770b18
institution Directory Open Access Journal
issn 2076-3425
language English
last_indexed 2024-03-10T12:38:31Z
publishDate 2021-04-01
publisher MDPI AG
record_format Article
series Brain Sciences
spelling doaj.art-435ee3d6d68a40cea9ca7ba330770b182023-11-21T14:04:05ZengMDPI AGBrain Sciences2076-34252021-04-0111445310.3390/brainsci11040453Diagnostic Classification and Biomarker Identification of Alzheimer’s Disease with Random Forest AlgorithmMinseok Song0Hyeyoom Jung1Seungyong Lee2Donghyeon Kim3Minkyu Ahn4School of Computer Science and Electrical Engineering, Handong Global University, Pohang-si 37554, KoreaSchool of Computer Science and Electrical Engineering, Handong Global University, Pohang-si 37554, KoreaSchool of Computer Science and Electrical Engineering, Handong Global University, Pohang-si 37554, KoreaNeurophet Inc., Gangnam-gu, Seoul 08380, KoreaSchool of Computer Science and Electrical Engineering, Handong Global University, Pohang-si 37554, KoreaRandom Forest (RF) is a bagging ensemble model and has many important advantages, such as robustness to noise, an effective structure for complex multimodal data and parallel computing, and also provides important features that help investigate biomarkers. Despite these benefits, RF is not used actively to predict Alzheimer’s disease (AD) with brain MRIs. Recent studies have reported RF’s effectiveness in predicting AD, but the test sample sizes were too small to draw any solid conclusions. Thus, it is timely to compare RF with other learning model methods, including deep learning, particularly with large amounts of data. In this study, we tested RF and various machine learning models with regional volumes from 2250 brain MRIs: 687 normal controls (NC), 1094 mild cognitive impairment (MCI), and 469 AD that ADNI (Alzheimer’s Disease Neuroimaging Initiative database) provided. Three types of features sets (63, 29, and 22 features) were selected, and classification accuracies were computed with RF, Support vector machine (SVM), Multi-layer perceptron (MLP), and Convolutional neural network (CNN). As a result, RF, MLP, and CNN showed high performances of 90.2%, 89.6%, and 90.5% with 63 features. Interestingly, when 22 features were used, RF showed the smallest decrease in accuracy, −3.8%, and the standard deviation did not change significantly, while MLP and CNN yielded decreases in accuracy of −6.8% and −4.5% with changes in the standard deviation from 3.3% to 4.0% for MLP and 2.1% to 7.0% for CNN, indicating that RF predicts AD more reliably with fewer features. In addition, we investigated the importance of the features that RF provides, and identified the hippocampus, amygdala, and inferior lateral ventricle as the major contributors in classifying NC, MCI, and AD. On average, AD showed smaller hippocampus and amygdala volumes and a larger volume of inferior lateral ventricle than those of MCI and NC.https://www.mdpi.com/2076-3425/11/4/453Alzheimer’s diseasemild-cognitive impairmentmagnetic resonance imagingmachine learningRandom Forestfeature importance
spellingShingle Minseok Song
Hyeyoom Jung
Seungyong Lee
Donghyeon Kim
Minkyu Ahn
Diagnostic Classification and Biomarker Identification of Alzheimer’s Disease with Random Forest Algorithm
Brain Sciences
Alzheimer’s disease
mild-cognitive impairment
magnetic resonance imaging
machine learning
Random Forest
feature importance
title Diagnostic Classification and Biomarker Identification of Alzheimer’s Disease with Random Forest Algorithm
title_full Diagnostic Classification and Biomarker Identification of Alzheimer’s Disease with Random Forest Algorithm
title_fullStr Diagnostic Classification and Biomarker Identification of Alzheimer’s Disease with Random Forest Algorithm
title_full_unstemmed Diagnostic Classification and Biomarker Identification of Alzheimer’s Disease with Random Forest Algorithm
title_short Diagnostic Classification and Biomarker Identification of Alzheimer’s Disease with Random Forest Algorithm
title_sort diagnostic classification and biomarker identification of alzheimer s disease with random forest algorithm
topic Alzheimer’s disease
mild-cognitive impairment
magnetic resonance imaging
machine learning
Random Forest
feature importance
url https://www.mdpi.com/2076-3425/11/4/453
work_keys_str_mv AT minseoksong diagnosticclassificationandbiomarkeridentificationofalzheimersdiseasewithrandomforestalgorithm
AT hyeyoomjung diagnosticclassificationandbiomarkeridentificationofalzheimersdiseasewithrandomforestalgorithm
AT seungyonglee diagnosticclassificationandbiomarkeridentificationofalzheimersdiseasewithrandomforestalgorithm
AT donghyeonkim diagnosticclassificationandbiomarkeridentificationofalzheimersdiseasewithrandomforestalgorithm
AT minkyuahn diagnosticclassificationandbiomarkeridentificationofalzheimersdiseasewithrandomforestalgorithm