Diagnostic Classification and Biomarker Identification of Alzheimer’s Disease with Random Forest Algorithm

Random Forest (RF) is a bagging ensemble model and has many important advantages, such as robustness to noise, an effective structure for complex multimodal data and parallel computing, and also provides important features that help investigate biomarkers. Despite these benefits, RF is not used acti...

Full description

Bibliographic Details
Main Authors:	Minseok Song, Hyeyoom Jung, Seungyong Lee, Donghyeon Kim, Minkyu Ahn
Format:	Article
Language:	English
Published:	MDPI AG 2021-04-01
Series:	Brain Sciences
Subjects:	Alzheimer’s disease mild-cognitive impairment magnetic resonance imaging machine learning Random Forest feature importance
Online Access:	https://www.mdpi.com/2076-3425/11/4/453

_version_	1827695979191599104
author	Minseok Song Hyeyoom Jung Seungyong Lee Donghyeon Kim Minkyu Ahn
author_facet	Minseok Song Hyeyoom Jung Seungyong Lee Donghyeon Kim Minkyu Ahn
author_sort	Minseok Song
collection	DOAJ
description	Random Forest (RF) is a bagging ensemble model and has many important advantages, such as robustness to noise, an effective structure for complex multimodal data and parallel computing, and also provides important features that help investigate biomarkers. Despite these benefits, RF is not used actively to predict Alzheimer’s disease (AD) with brain MRIs. Recent studies have reported RF’s effectiveness in predicting AD, but the test sample sizes were too small to draw any solid conclusions. Thus, it is timely to compare RF with other learning model methods, including deep learning, particularly with large amounts of data. In this study, we tested RF and various machine learning models with regional volumes from 2250 brain MRIs: 687 normal controls (NC), 1094 mild cognitive impairment (MCI), and 469 AD that ADNI (Alzheimer’s Disease Neuroimaging Initiative database) provided. Three types of features sets (63, 29, and 22 features) were selected, and classification accuracies were computed with RF, Support vector machine (SVM), Multi-layer perceptron (MLP), and Convolutional neural network (CNN). As a result, RF, MLP, and CNN showed high performances of 90.2%, 89.6%, and 90.5% with 63 features. Interestingly, when 22 features were used, RF showed the smallest decrease in accuracy, −3.8%, and the standard deviation did not change significantly, while MLP and CNN yielded decreases in accuracy of −6.8% and −4.5% with changes in the standard deviation from 3.3% to 4.0% for MLP and 2.1% to 7.0% for CNN, indicating that RF predicts AD more reliably with fewer features. In addition, we investigated the importance of the features that RF provides, and identified the hippocampus, amygdala, and inferior lateral ventricle as the major contributors in classifying NC, MCI, and AD. On average, AD showed smaller hippocampus and amygdala volumes and a larger volume of inferior lateral ventricle than those of MCI and NC.
first_indexed	2024-03-10T12:38:31Z
format	Article
id	doaj.art-435ee3d6d68a40cea9ca7ba330770b18
institution	Directory Open Access Journal
issn	2076-3425
language	English
last_indexed	2024-03-10T12:38:31Z
publishDate	2021-04-01
publisher	MDPI AG
record_format	Article
series	Brain Sciences
spelling	doaj.art-435ee3d6d68a40cea9ca7ba330770b182023-11-21T14:04:05ZengMDPI AGBrain Sciences2076-34252021-04-0111445310.3390/brainsci11040453Diagnostic Classification and Biomarker Identification of Alzheimer’s Disease with Random Forest AlgorithmMinseok Song0Hyeyoom Jung1Seungyong Lee2Donghyeon Kim3Minkyu Ahn4School of Computer Science and Electrical Engineering, Handong Global University, Pohang-si 37554, KoreaSchool of Computer Science and Electrical Engineering, Handong Global University, Pohang-si 37554, KoreaSchool of Computer Science and Electrical Engineering, Handong Global University, Pohang-si 37554, KoreaNeurophet Inc., Gangnam-gu, Seoul 08380, KoreaSchool of Computer Science and Electrical Engineering, Handong Global University, Pohang-si 37554, KoreaRandom Forest (RF) is a bagging ensemble model and has many important advantages, such as robustness to noise, an effective structure for complex multimodal data and parallel computing, and also provides important features that help investigate biomarkers. Despite these benefits, RF is not used actively to predict Alzheimer’s disease (AD) with brain MRIs. Recent studies have reported RF’s effectiveness in predicting AD, but the test sample sizes were too small to draw any solid conclusions. Thus, it is timely to compare RF with other learning model methods, including deep learning, particularly with large amounts of data. In this study, we tested RF and various machine learning models with regional volumes from 2250 brain MRIs: 687 normal controls (NC), 1094 mild cognitive impairment (MCI), and 469 AD that ADNI (Alzheimer’s Disease Neuroimaging Initiative database) provided. Three types of features sets (63, 29, and 22 features) were selected, and classification accuracies were computed with RF, Support vector machine (SVM), Multi-layer perceptron (MLP), and Convolutional neural network (CNN). As a result, RF, MLP, and CNN showed high performances of 90.2%, 89.6%, and 90.5% with 63 features. Interestingly, when 22 features were used, RF showed the smallest decrease in accuracy, −3.8%, and the standard deviation did not change significantly, while MLP and CNN yielded decreases in accuracy of −6.8% and −4.5% with changes in the standard deviation from 3.3% to 4.0% for MLP and 2.1% to 7.0% for CNN, indicating that RF predicts AD more reliably with fewer features. In addition, we investigated the importance of the features that RF provides, and identified the hippocampus, amygdala, and inferior lateral ventricle as the major contributors in classifying NC, MCI, and AD. On average, AD showed smaller hippocampus and amygdala volumes and a larger volume of inferior lateral ventricle than those of MCI and NC.https://www.mdpi.com/2076-3425/11/4/453Alzheimer’s diseasemild-cognitive impairmentmagnetic resonance imagingmachine learningRandom Forestfeature importance
spellingShingle	Minseok Song Hyeyoom Jung Seungyong Lee Donghyeon Kim Minkyu Ahn Diagnostic Classification and Biomarker Identification of Alzheimer’s Disease with Random Forest Algorithm Brain Sciences Alzheimer’s disease mild-cognitive impairment magnetic resonance imaging machine learning Random Forest feature importance
title	Diagnostic Classification and Biomarker Identification of Alzheimer’s Disease with Random Forest Algorithm
title_full	Diagnostic Classification and Biomarker Identification of Alzheimer’s Disease with Random Forest Algorithm
title_fullStr	Diagnostic Classification and Biomarker Identification of Alzheimer’s Disease with Random Forest Algorithm
title_full_unstemmed	Diagnostic Classification and Biomarker Identification of Alzheimer’s Disease with Random Forest Algorithm
title_short	Diagnostic Classification and Biomarker Identification of Alzheimer’s Disease with Random Forest Algorithm
title_sort	diagnostic classification and biomarker identification of alzheimer s disease with random forest algorithm
topic	Alzheimer’s disease mild-cognitive impairment magnetic resonance imaging machine learning Random Forest feature importance
url	https://www.mdpi.com/2076-3425/11/4/453
work_keys_str_mv	AT minseoksong diagnosticclassificationandbiomarkeridentificationofalzheimersdiseasewithrandomforestalgorithm AT hyeyoomjung diagnosticclassificationandbiomarkeridentificationofalzheimersdiseasewithrandomforestalgorithm AT seungyonglee diagnosticclassificationandbiomarkeridentificationofalzheimersdiseasewithrandomforestalgorithm AT donghyeonkim diagnosticclassificationandbiomarkeridentificationofalzheimersdiseasewithrandomforestalgorithm AT minkyuahn diagnosticclassificationandbiomarkeridentificationofalzheimersdiseasewithrandomforestalgorithm

Diagnostic Classification and Biomarker Identification of Alzheimer’s Disease with Random Forest Algorithm

Similar Items