A Fast Parallel Random Forest Algorithm Based on Spark

To improve the computational efficiency and classification accuracy in the context of big data, an optimized parallel random forest algorithm is proposed based on the Spark computing framework. First, a new Gini coefficient is defined to reduce the impact of feature redundancy for higher classificat...

Full description

Bibliographic Details
Main Authors: Linzi Yin, Ken Chen, Zhaohui Jiang, Xuemei Xu
Format: Article
Language:English
Published: MDPI AG 2023-05-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/13/10/6121