An Ensemble Framework to Improve the Accuracy of Prediction Using Clustered Random-Forest and Shrinkage Methods
Nowadays, in the topics related to prediction, in addition to increasing the accuracy of existing algorithms, the reduction of computational time is a challenging issue that has attracted much attention. Since the existing methods may not have enough efficiency and accuracy, we use a combination of...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-10-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/12/20/10608 |
_version_ | 1797475472435576832 |
---|---|
author | Zari Farhadi Hossein Bevrani Mohammad-Reza Feizi-Derakhshi Wonjoon Kim Muhammad Fazal Ijaz |
author_facet | Zari Farhadi Hossein Bevrani Mohammad-Reza Feizi-Derakhshi Wonjoon Kim Muhammad Fazal Ijaz |
author_sort | Zari Farhadi |
collection | DOAJ |
description | Nowadays, in the topics related to prediction, in addition to increasing the accuracy of existing algorithms, the reduction of computational time is a challenging issue that has attracted much attention. Since the existing methods may not have enough efficiency and accuracy, we use a combination of machine-learning algorithms and statistical methods to solve this problem. Furthermore, we reduce the computational time in the testing model by automatically reducing the number of trees using penalized methods and ensembling the remaining trees. We call this efficient combinatorial method “ensemble of clustered and penalized random forest (ECAPRAF)”. This method consists of four fundamental parts. In the first part, k-means clustering is used to identify homogeneous subsets of data and assign them to similar groups. In the second part, a tree-based algorithm is used within each cluster as a predictor model; in this work, random forest is selected. In the next part, penalized methods are used to reduce the number of random-forest trees and remove high-variance trees from the proposed model. This increases model accuracy and decreases the computational time in the test phase. In the last part, the remaining trees within each cluster are combined. The results of the simulation and two real datasets based on the WRMSE criterion show that our proposed method has better performance than the traditional random forest by reducing approximately 12.75%, 11.82%, 12.93%, and 11.68% and selecting 99, 106, 113, and 118 trees for the ECAPRAF–EN algorithm. |
first_indexed | 2024-03-09T20:45:34Z |
format | Article |
id | doaj.art-52f51007791441bf9d805ac8a9f742d3 |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-09T20:45:34Z |
publishDate | 2022-10-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-52f51007791441bf9d805ac8a9f742d32023-11-23T22:47:28ZengMDPI AGApplied Sciences2076-34172022-10-0112201060810.3390/app122010608An Ensemble Framework to Improve the Accuracy of Prediction Using Clustered Random-Forest and Shrinkage MethodsZari Farhadi0Hossein Bevrani1Mohammad-Reza Feizi-Derakhshi2Wonjoon Kim3Muhammad Fazal Ijaz4Department of Statistics, Faculty of Mathematics, Statistics and Computer Sciences, University of Tabriz, Tabriz 51666, IranDepartment of Statistics, Faculty of Mathematics, Statistics and Computer Sciences, University of Tabriz, Tabriz 51666, IranDepartment of Computer Engineering, Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz 51666, IranDivision of Future Convergence (HCI Science Major), Dongduk Women’s University, Seoul 02748, KoreaDepartment of Intelligent Mechatronics Engineering, Sejong University, Seoul 05006, KoreaNowadays, in the topics related to prediction, in addition to increasing the accuracy of existing algorithms, the reduction of computational time is a challenging issue that has attracted much attention. Since the existing methods may not have enough efficiency and accuracy, we use a combination of machine-learning algorithms and statistical methods to solve this problem. Furthermore, we reduce the computational time in the testing model by automatically reducing the number of trees using penalized methods and ensembling the remaining trees. We call this efficient combinatorial method “ensemble of clustered and penalized random forest (ECAPRAF)”. This method consists of four fundamental parts. In the first part, k-means clustering is used to identify homogeneous subsets of data and assign them to similar groups. In the second part, a tree-based algorithm is used within each cluster as a predictor model; in this work, random forest is selected. In the next part, penalized methods are used to reduce the number of random-forest trees and remove high-variance trees from the proposed model. This increases model accuracy and decreases the computational time in the test phase. In the last part, the remaining trees within each cluster are combined. The results of the simulation and two real datasets based on the WRMSE criterion show that our proposed method has better performance than the traditional random forest by reducing approximately 12.75%, 11.82%, 12.93%, and 11.68% and selecting 99, 106, 113, and 118 trees for the ECAPRAF–EN algorithm.https://www.mdpi.com/2076-3417/12/20/10608random forestmachine learningensemble learningclusteringlassoelastic net |
spellingShingle | Zari Farhadi Hossein Bevrani Mohammad-Reza Feizi-Derakhshi Wonjoon Kim Muhammad Fazal Ijaz An Ensemble Framework to Improve the Accuracy of Prediction Using Clustered Random-Forest and Shrinkage Methods Applied Sciences random forest machine learning ensemble learning clustering lasso elastic net |
title | An Ensemble Framework to Improve the Accuracy of Prediction Using Clustered Random-Forest and Shrinkage Methods |
title_full | An Ensemble Framework to Improve the Accuracy of Prediction Using Clustered Random-Forest and Shrinkage Methods |
title_fullStr | An Ensemble Framework to Improve the Accuracy of Prediction Using Clustered Random-Forest and Shrinkage Methods |
title_full_unstemmed | An Ensemble Framework to Improve the Accuracy of Prediction Using Clustered Random-Forest and Shrinkage Methods |
title_short | An Ensemble Framework to Improve the Accuracy of Prediction Using Clustered Random-Forest and Shrinkage Methods |
title_sort | ensemble framework to improve the accuracy of prediction using clustered random forest and shrinkage methods |
topic | random forest machine learning ensemble learning clustering lasso elastic net |
url | https://www.mdpi.com/2076-3417/12/20/10608 |
work_keys_str_mv | AT zarifarhadi anensembleframeworktoimprovetheaccuracyofpredictionusingclusteredrandomforestandshrinkagemethods AT hosseinbevrani anensembleframeworktoimprovetheaccuracyofpredictionusingclusteredrandomforestandshrinkagemethods AT mohammadrezafeiziderakhshi anensembleframeworktoimprovetheaccuracyofpredictionusingclusteredrandomforestandshrinkagemethods AT wonjoonkim anensembleframeworktoimprovetheaccuracyofpredictionusingclusteredrandomforestandshrinkagemethods AT muhammadfazalijaz anensembleframeworktoimprovetheaccuracyofpredictionusingclusteredrandomforestandshrinkagemethods AT zarifarhadi ensembleframeworktoimprovetheaccuracyofpredictionusingclusteredrandomforestandshrinkagemethods AT hosseinbevrani ensembleframeworktoimprovetheaccuracyofpredictionusingclusteredrandomforestandshrinkagemethods AT mohammadrezafeiziderakhshi ensembleframeworktoimprovetheaccuracyofpredictionusingclusteredrandomforestandshrinkagemethods AT wonjoonkim ensembleframeworktoimprovetheaccuracyofpredictionusingclusteredrandomforestandshrinkagemethods AT muhammadfazalijaz ensembleframeworktoimprovetheaccuracyofpredictionusingclusteredrandomforestandshrinkagemethods |