An Ensemble Framework to Improve the Accuracy of Prediction Using Clustered Random-Forest and Shrinkage Methods

Nowadays, in the topics related to prediction, in addition to increasing the accuracy of existing algorithms, the reduction of computational time is a challenging issue that has attracted much attention. Since the existing methods may not have enough efficiency and accuracy, we use a combination of...

Full description

Bibliographic Details
Main Authors: Zari Farhadi, Hossein Bevrani, Mohammad-Reza Feizi-Derakhshi, Wonjoon Kim, Muhammad Fazal Ijaz
Format: Article
Language:English
Published: MDPI AG 2022-10-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/12/20/10608
_version_ 1797475472435576832
author Zari Farhadi
Hossein Bevrani
Mohammad-Reza Feizi-Derakhshi
Wonjoon Kim
Muhammad Fazal Ijaz
author_facet Zari Farhadi
Hossein Bevrani
Mohammad-Reza Feizi-Derakhshi
Wonjoon Kim
Muhammad Fazal Ijaz
author_sort Zari Farhadi
collection DOAJ
description Nowadays, in the topics related to prediction, in addition to increasing the accuracy of existing algorithms, the reduction of computational time is a challenging issue that has attracted much attention. Since the existing methods may not have enough efficiency and accuracy, we use a combination of machine-learning algorithms and statistical methods to solve this problem. Furthermore, we reduce the computational time in the testing model by automatically reducing the number of trees using penalized methods and ensembling the remaining trees. We call this efficient combinatorial method “ensemble of clustered and penalized random forest (ECAPRAF)”. This method consists of four fundamental parts. In the first part, k-means clustering is used to identify homogeneous subsets of data and assign them to similar groups. In the second part, a tree-based algorithm is used within each cluster as a predictor model; in this work, random forest is selected. In the next part, penalized methods are used to reduce the number of random-forest trees and remove high-variance trees from the proposed model. This increases model accuracy and decreases the computational time in the test phase. In the last part, the remaining trees within each cluster are combined. The results of the simulation and two real datasets based on the WRMSE criterion show that our proposed method has better performance than the traditional random forest by reducing approximately 12.75%, 11.82%, 12.93%, and 11.68% and selecting 99, 106, 113, and 118 trees for the ECAPRAF–EN algorithm.
first_indexed 2024-03-09T20:45:34Z
format Article
id doaj.art-52f51007791441bf9d805ac8a9f742d3
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-09T20:45:34Z
publishDate 2022-10-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-52f51007791441bf9d805ac8a9f742d32023-11-23T22:47:28ZengMDPI AGApplied Sciences2076-34172022-10-0112201060810.3390/app122010608An Ensemble Framework to Improve the Accuracy of Prediction Using Clustered Random-Forest and Shrinkage MethodsZari Farhadi0Hossein Bevrani1Mohammad-Reza Feizi-Derakhshi2Wonjoon Kim3Muhammad Fazal Ijaz4Department of Statistics, Faculty of Mathematics, Statistics and Computer Sciences, University of Tabriz, Tabriz 51666, IranDepartment of Statistics, Faculty of Mathematics, Statistics and Computer Sciences, University of Tabriz, Tabriz 51666, IranDepartment of Computer Engineering, Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz 51666, IranDivision of Future Convergence (HCI Science Major), Dongduk Women’s University, Seoul 02748, KoreaDepartment of Intelligent Mechatronics Engineering, Sejong University, Seoul 05006, KoreaNowadays, in the topics related to prediction, in addition to increasing the accuracy of existing algorithms, the reduction of computational time is a challenging issue that has attracted much attention. Since the existing methods may not have enough efficiency and accuracy, we use a combination of machine-learning algorithms and statistical methods to solve this problem. Furthermore, we reduce the computational time in the testing model by automatically reducing the number of trees using penalized methods and ensembling the remaining trees. We call this efficient combinatorial method “ensemble of clustered and penalized random forest (ECAPRAF)”. This method consists of four fundamental parts. In the first part, k-means clustering is used to identify homogeneous subsets of data and assign them to similar groups. In the second part, a tree-based algorithm is used within each cluster as a predictor model; in this work, random forest is selected. In the next part, penalized methods are used to reduce the number of random-forest trees and remove high-variance trees from the proposed model. This increases model accuracy and decreases the computational time in the test phase. In the last part, the remaining trees within each cluster are combined. The results of the simulation and two real datasets based on the WRMSE criterion show that our proposed method has better performance than the traditional random forest by reducing approximately 12.75%, 11.82%, 12.93%, and 11.68% and selecting 99, 106, 113, and 118 trees for the ECAPRAF–EN algorithm.https://www.mdpi.com/2076-3417/12/20/10608random forestmachine learningensemble learningclusteringlassoelastic net
spellingShingle Zari Farhadi
Hossein Bevrani
Mohammad-Reza Feizi-Derakhshi
Wonjoon Kim
Muhammad Fazal Ijaz
An Ensemble Framework to Improve the Accuracy of Prediction Using Clustered Random-Forest and Shrinkage Methods
Applied Sciences
random forest
machine learning
ensemble learning
clustering
lasso
elastic net
title An Ensemble Framework to Improve the Accuracy of Prediction Using Clustered Random-Forest and Shrinkage Methods
title_full An Ensemble Framework to Improve the Accuracy of Prediction Using Clustered Random-Forest and Shrinkage Methods
title_fullStr An Ensemble Framework to Improve the Accuracy of Prediction Using Clustered Random-Forest and Shrinkage Methods
title_full_unstemmed An Ensemble Framework to Improve the Accuracy of Prediction Using Clustered Random-Forest and Shrinkage Methods
title_short An Ensemble Framework to Improve the Accuracy of Prediction Using Clustered Random-Forest and Shrinkage Methods
title_sort ensemble framework to improve the accuracy of prediction using clustered random forest and shrinkage methods
topic random forest
machine learning
ensemble learning
clustering
lasso
elastic net
url https://www.mdpi.com/2076-3417/12/20/10608
work_keys_str_mv AT zarifarhadi anensembleframeworktoimprovetheaccuracyofpredictionusingclusteredrandomforestandshrinkagemethods
AT hosseinbevrani anensembleframeworktoimprovetheaccuracyofpredictionusingclusteredrandomforestandshrinkagemethods
AT mohammadrezafeiziderakhshi anensembleframeworktoimprovetheaccuracyofpredictionusingclusteredrandomforestandshrinkagemethods
AT wonjoonkim anensembleframeworktoimprovetheaccuracyofpredictionusingclusteredrandomforestandshrinkagemethods
AT muhammadfazalijaz anensembleframeworktoimprovetheaccuracyofpredictionusingclusteredrandomforestandshrinkagemethods
AT zarifarhadi ensembleframeworktoimprovetheaccuracyofpredictionusingclusteredrandomforestandshrinkagemethods
AT hosseinbevrani ensembleframeworktoimprovetheaccuracyofpredictionusingclusteredrandomforestandshrinkagemethods
AT mohammadrezafeiziderakhshi ensembleframeworktoimprovetheaccuracyofpredictionusingclusteredrandomforestandshrinkagemethods
AT wonjoonkim ensembleframeworktoimprovetheaccuracyofpredictionusingclusteredrandomforestandshrinkagemethods
AT muhammadfazalijaz ensembleframeworktoimprovetheaccuracyofpredictionusingclusteredrandomforestandshrinkagemethods