An Improved C4.5 Algorthm in Bagging Integration Model

The C4.5 algorithm has three shortcomings: the wide range of candidate segmentation threshold sequences for continuous attributes, the comprehensive influence of different attributes and local subsets under the same attribute, and the inter-attribute redundancy. When dealing with continuous attribut...

Full description

Bibliographic Details
Main Authors: Yu-Qing Song, Xu Yao, Zhe Liu, Xianbao Shen, Jingyi Mao
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9231272/
Description
Summary:The C4.5 algorithm has three shortcomings: the wide range of candidate segmentation threshold sequences for continuous attributes, the comprehensive influence of different attributes and local subsets under the same attribute, and the inter-attribute redundancy. When dealing with continuous attributes, sampling and threshold supplement processing near the transition boundary of the attribute interval corresponding to the adjacent different categories are performed for narrowing the range of candate segmentation threshold sequences. By adding standardizing Euclidean distance of the attribute global and local factors to represent attribute weight, the calculation of C4.5 information gain is otpimized. And averaging Gini index of other attributes and adding correction factor, the influence of redundancy between attributes is greatly decreased. The overall average improvement range of the base classifier and the bagging integration classifier is 0.6%~2.1% and 0.7% ~ 2.7%, respectively, which shows that this integration model can improve the classification accuracy and also validate its feasibility and reliability.
ISSN:2169-3536