A Fast Parallel Random Forest Algorithm Based on Spark
To improve the computational efficiency and classification accuracy in the context of big data, an optimized parallel random forest algorithm is proposed based on the Spark computing framework. First, a new Gini coefficient is defined to reduce the impact of feature redundancy for higher classificat...
Main Authors: | Linzi Yin, Ken Chen, Zhaohui Jiang, Xuemei Xu |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-05-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/13/10/6121 |
Similar Items
-
Apache Spark ile Makine Öğrenmesi Destekli Diyabet Rahatsızlığı Tahmini
by: Emre Yıldırım, et al.
Published: (2022-07-01) -
Framing Apache Spark in life sciences
by: Andrea Manconi, et al.
Published: (2023-02-01) -
A Parallel Multiobjective PSO Weighted Average Clustering Algorithm Based on Apache Spark
by: Huidong Ling, et al.
Published: (2023-01-01) -
QoS-Aware Approximate Query Processing for Smart Cities Spatial Data Streams
by: Isam Mashhour Al Jawarneh, et al.
Published: (2021-06-01) -
Performance Evaluation of Query Plan Recommendation with Apache Hadoop and Apache Spark
by: Elham Azhir, et al.
Published: (2022-09-01)