Combining Hybrid Approach Redefinition-Multiclass Imbalance (HAR-MI) and Hybrid Sampling in Handling Multi-Class Imbalance and Overlapping

The class imbalance problem in the multi-class dataset is more challenging to manage than the problem in the two classes and this problem is more complicated if accompanied by overlapping. One method that has proven reliable in dealing with this problem is the Hybrid Approach Redefinition-Multiclass...

Full description

Bibliographic Details
Main Authors: Hartono Hartono, Erianto Ongko
Format: Article
Language:English
Published: Politeknik Negeri Padang 2021-03-01
Series:JOIV: International Journal on Informatics Visualization
Subjects:
Online Access:https://joiv.org/index.php/joiv/article/view/420
Description
Summary:The class imbalance problem in the multi-class dataset is more challenging to manage than the problem in the two classes and this problem is more complicated if accompanied by overlapping. One method that has proven reliable in dealing with this problem is the Hybrid Approach Redefinition-Multiclass Imbalance (HAR-MI) method which is classified as a hybrid approach that combines sampling and classifier ensembles. However, in terms of diversity among classifiers, a hybrid approach that combines sampling and classifier ensembles will give better results. HAR-MI provides excellent results in handling multi-class imbalances. The HAR-MI method uses SMOTE to increase the number of samples in the minority class. However, this SMOTE also has a weakness where an extremely imbalanced dataset and a large number of attributes will be over-fitting. To overcome the problem of over-fitting, the Hybrid Sampling method was proposed. HAR-MI combination with Hybrid Sampling is done to increase the number of samples in the minority class and at the same time reduce the number of noise samples in the majority class. The preprocessing stages at HAR-MI will use the Minimizing Overlapping Selection under Hybrid Sampling (MOSHS) method, and the processing stages will use Different Contribution Sampling. The results obtained will be compared with the results using Neighbourhood-based under-sampling. Overlapping and Classifier Performance will be measured using Augmented R-Value, the Matthews Correlation Coefficient (MCC), Precision, Recall, and F-Value. The results showed that HAR-MI with Hybrid Sampling gave better results in terms of Augmented R-Value, Precision, Recall, and F-Value
ISSN:2549-9610
2549-9904