A Density-Based Random Forest for Imbalanced Data Classification

Many machine learning problem domains, such as the detection of fraud, spam, outliers, and anomalies, tend to involve inherently imbalanced class distributions of samples. However, most classification algorithms assume equivalent sample sizes for each class. Therefore, imbalanced classification data...

Full description

Bibliographic Details
Main Authors:	Jia Dong, Quan Qian
Format:	Article
Language:	English
Published:	MDPI AG 2022-03-01
Series:	Future Internet
Subjects:	density-based random forest imbalanced data classification boundary and density domain partition
Online Access:	https://www.mdpi.com/1999-5903/14/3/90

Description
Summary:	Many machine learning problem domains, such as the detection of fraud, spam, outliers, and anomalies, tend to involve inherently imbalanced class distributions of samples. However, most classification algorithms assume equivalent sample sizes for each class. Therefore, imbalanced classification datasets pose a significant challenge in prediction modeling. Herein, we propose a density-based random forest algorithm (DBRF) to improve the prediction performance, especially for minority classes. DBRF is designed to recognize boundary samples as the most difficult to classify and then use a density-based method to augment them. Subsequently, two different random forest classifiers were constructed to model the augmented boundary samples and the original dataset dependently, and the final output was determined using a bagging technique. A real-world material classification dataset and 33 open public imbalanced datasets were used to evaluate the performance of DBRF. On the 34 datasets, DBRF could achieve improvements of 2–15% over random forest in terms of the F1-measure and G-mean. The experimental results proved the ability of DBRF to solve the problem of classifying objects located on the class boundary, including objects of minority classes, by taking into account the density of objects in space.
ISSN:	1999-5903

A Density-Based Random Forest for Imbalanced Data Classification

Similar Items