An Ensemble Framework to Improve the Accuracy of Prediction Using Clustered Random-Forest and Shrinkage Methods

Nowadays, in the topics related to prediction, in addition to increasing the accuracy of existing algorithms, the reduction of computational time is a challenging issue that has attracted much attention. Since the existing methods may not have enough efficiency and accuracy, we use a combination of...

Full description

Bibliographic Details
Main Authors:	Zari Farhadi, Hossein Bevrani, Mohammad-Reza Feizi-Derakhshi, Wonjoon Kim, Muhammad Fazal Ijaz
Format:	Article
Language:	English
Published:	MDPI AG 2022-10-01
Series:	Applied Sciences
Subjects:	random forest machine learning ensemble learning clustering lasso elastic net
Online Access:	https://www.mdpi.com/2076-3417/12/20/10608

_version_	1797475472435576832
author	Zari Farhadi Hossein Bevrani Mohammad-Reza Feizi-Derakhshi Wonjoon Kim Muhammad Fazal Ijaz
author_facet	Zari Farhadi Hossein Bevrani Mohammad-Reza Feizi-Derakhshi Wonjoon Kim Muhammad Fazal Ijaz
author_sort	Zari Farhadi
collection	DOAJ
description	Nowadays, in the topics related to prediction, in addition to increasing the accuracy of existing algorithms, the reduction of computational time is a challenging issue that has attracted much attention. Since the existing methods may not have enough efficiency and accuracy, we use a combination of machine-learning algorithms and statistical methods to solve this problem. Furthermore, we reduce the computational time in the testing model by automatically reducing the number of trees using penalized methods and ensembling the remaining trees. We call this efficient combinatorial method “ensemble of clustered and penalized random forest (ECAPRAF)”. This method consists of four fundamental parts. In the first part, k-means clustering is used to identify homogeneous subsets of data and assign them to similar groups. In the second part, a tree-based algorithm is used within each cluster as a predictor model; in this work, random forest is selected. In the next part, penalized methods are used to reduce the number of random-forest trees and remove high-variance trees from the proposed model. This increases model accuracy and decreases the computational time in the test phase. In the last part, the remaining trees within each cluster are combined. The results of the simulation and two real datasets based on the WRMSE criterion show that our proposed method has better performance than the traditional random forest by reducing approximately 12.75%, 11.82%, 12.93%, and 11.68% and selecting 99, 106, 113, and 118 trees for the ECAPRAF–EN algorithm.
first_indexed	2024-03-09T20:45:34Z
format	Article
id	doaj.art-52f51007791441bf9d805ac8a9f742d3
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-09T20:45:34Z
publishDate	2022-10-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-52f51007791441bf9d805ac8a9f742d32023-11-23T22:47:28ZengMDPI AGApplied Sciences2076-34172022-10-0112201060810.3390/app122010608An Ensemble Framework to Improve the Accuracy of Prediction Using Clustered Random-Forest and Shrinkage MethodsZari Farhadi0Hossein Bevrani1Mohammad-Reza Feizi-Derakhshi2Wonjoon Kim3Muhammad Fazal Ijaz4Department of Statistics, Faculty of Mathematics, Statistics and Computer Sciences, University of Tabriz, Tabriz 51666, IranDepartment of Statistics, Faculty of Mathematics, Statistics and Computer Sciences, University of Tabriz, Tabriz 51666, IranDepartment of Computer Engineering, Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz 51666, IranDivision of Future Convergence (HCI Science Major), Dongduk Women’s University, Seoul 02748, KoreaDepartment of Intelligent Mechatronics Engineering, Sejong University, Seoul 05006, KoreaNowadays, in the topics related to prediction, in addition to increasing the accuracy of existing algorithms, the reduction of computational time is a challenging issue that has attracted much attention. Since the existing methods may not have enough efficiency and accuracy, we use a combination of machine-learning algorithms and statistical methods to solve this problem. Furthermore, we reduce the computational time in the testing model by automatically reducing the number of trees using penalized methods and ensembling the remaining trees. We call this efficient combinatorial method “ensemble of clustered and penalized random forest (ECAPRAF)”. This method consists of four fundamental parts. In the first part, k-means clustering is used to identify homogeneous subsets of data and assign them to similar groups. In the second part, a tree-based algorithm is used within each cluster as a predictor model; in this work, random forest is selected. In the next part, penalized methods are used to reduce the number of random-forest trees and remove high-variance trees from the proposed model. This increases model accuracy and decreases the computational time in the test phase. In the last part, the remaining trees within each cluster are combined. The results of the simulation and two real datasets based on the WRMSE criterion show that our proposed method has better performance than the traditional random forest by reducing approximately 12.75%, 11.82%, 12.93%, and 11.68% and selecting 99, 106, 113, and 118 trees for the ECAPRAF–EN algorithm.https://www.mdpi.com/2076-3417/12/20/10608random forestmachine learningensemble learningclusteringlassoelastic net
spellingShingle	Zari Farhadi Hossein Bevrani Mohammad-Reza Feizi-Derakhshi Wonjoon Kim Muhammad Fazal Ijaz An Ensemble Framework to Improve the Accuracy of Prediction Using Clustered Random-Forest and Shrinkage Methods Applied Sciences random forest machine learning ensemble learning clustering lasso elastic net
title	An Ensemble Framework to Improve the Accuracy of Prediction Using Clustered Random-Forest and Shrinkage Methods
title_full	An Ensemble Framework to Improve the Accuracy of Prediction Using Clustered Random-Forest and Shrinkage Methods
title_fullStr	An Ensemble Framework to Improve the Accuracy of Prediction Using Clustered Random-Forest and Shrinkage Methods
title_full_unstemmed	An Ensemble Framework to Improve the Accuracy of Prediction Using Clustered Random-Forest and Shrinkage Methods
title_short	An Ensemble Framework to Improve the Accuracy of Prediction Using Clustered Random-Forest and Shrinkage Methods
title_sort	ensemble framework to improve the accuracy of prediction using clustered random forest and shrinkage methods
topic	random forest machine learning ensemble learning clustering lasso elastic net
url	https://www.mdpi.com/2076-3417/12/20/10608
work_keys_str_mv	AT zarifarhadi anensembleframeworktoimprovetheaccuracyofpredictionusingclusteredrandomforestandshrinkagemethods AT hosseinbevrani anensembleframeworktoimprovetheaccuracyofpredictionusingclusteredrandomforestandshrinkagemethods AT mohammadrezafeiziderakhshi anensembleframeworktoimprovetheaccuracyofpredictionusingclusteredrandomforestandshrinkagemethods AT wonjoonkim anensembleframeworktoimprovetheaccuracyofpredictionusingclusteredrandomforestandshrinkagemethods AT muhammadfazalijaz anensembleframeworktoimprovetheaccuracyofpredictionusingclusteredrandomforestandshrinkagemethods AT zarifarhadi ensembleframeworktoimprovetheaccuracyofpredictionusingclusteredrandomforestandshrinkagemethods AT hosseinbevrani ensembleframeworktoimprovetheaccuracyofpredictionusingclusteredrandomforestandshrinkagemethods AT mohammadrezafeiziderakhshi ensembleframeworktoimprovetheaccuracyofpredictionusingclusteredrandomforestandshrinkagemethods AT wonjoonkim ensembleframeworktoimprovetheaccuracyofpredictionusingclusteredrandomforestandshrinkagemethods AT muhammadfazalijaz ensembleframeworktoimprovetheaccuracyofpredictionusingclusteredrandomforestandshrinkagemethods

An Ensemble Framework to Improve the Accuracy of Prediction Using Clustered Random-Forest and Shrinkage Methods

Similar Items