An Improved C4.5 Algorthm in Bagging Integration Model

The C4.5 algorithm has three shortcomings: the wide range of candidate segmentation threshold sequences for continuous attributes, the comprehensive influence of different attributes and local subsets under the same attribute, and the inter-attribute redundancy. When dealing with continuous attribut...

Full description

Bibliographic Details
Main Authors: Yu-Qing Song, Xu Yao, Zhe Liu, Xianbao Shen, Jingyi Mao
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9231272/
_version_ 1811209874197970944
author Yu-Qing Song
Xu Yao
Zhe Liu
Xianbao Shen
Jingyi Mao
author_facet Yu-Qing Song
Xu Yao
Zhe Liu
Xianbao Shen
Jingyi Mao
author_sort Yu-Qing Song
collection DOAJ
description The C4.5 algorithm has three shortcomings: the wide range of candidate segmentation threshold sequences for continuous attributes, the comprehensive influence of different attributes and local subsets under the same attribute, and the inter-attribute redundancy. When dealing with continuous attributes, sampling and threshold supplement processing near the transition boundary of the attribute interval corresponding to the adjacent different categories are performed for narrowing the range of candate segmentation threshold sequences. By adding standardizing Euclidean distance of the attribute global and local factors to represent attribute weight, the calculation of C4.5 information gain is otpimized. And averaging Gini index of other attributes and adding correction factor, the influence of redundancy between attributes is greatly decreased. The overall average improvement range of the base classifier and the bagging integration classifier is 0.6%~2.1% and 0.7% ~ 2.7%, respectively, which shows that this integration model can improve the classification accuracy and also validate its feasibility and reliability.
first_indexed 2024-04-12T04:46:19Z
format Article
id doaj.art-f4598651786b4567ae0bde63441d990d
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-12T04:46:19Z
publishDate 2020-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-f4598651786b4567ae0bde63441d990d2022-12-22T03:47:30ZengIEEEIEEE Access2169-35362020-01-01820686620687510.1109/ACCESS.2020.30322919231272An Improved C4.5 Algorthm in Bagging Integration ModelYu-Qing Song0Xu Yao1https://orcid.org/0000-0001-8136-9528Zhe Liu2https://orcid.org/0000-0002-1197-0390Xianbao Shen3Jingyi Mao4School of Computer Science and Telecommunication, Jiangsu University, Zhenjiang, ChinaSchool of Computer Science and Telecommunication, Jiangsu University, Zhenjiang, ChinaSchool of Computer Science and Telecommunication, Jiangsu University, Zhenjiang, ChinaSchool of Computer Science and Telecommunication, Jiangsu University, Zhenjiang, ChinaSchool of Computer Science and Telecommunication, Jiangsu University, Zhenjiang, ChinaThe C4.5 algorithm has three shortcomings: the wide range of candidate segmentation threshold sequences for continuous attributes, the comprehensive influence of different attributes and local subsets under the same attribute, and the inter-attribute redundancy. When dealing with continuous attributes, sampling and threshold supplement processing near the transition boundary of the attribute interval corresponding to the adjacent different categories are performed for narrowing the range of candate segmentation threshold sequences. By adding standardizing Euclidean distance of the attribute global and local factors to represent attribute weight, the calculation of C4.5 information gain is otpimized. And averaging Gini index of other attributes and adding correction factor, the influence of redundancy between attributes is greatly decreased. The overall average improvement range of the base classifier and the bagging integration classifier is 0.6%~2.1% and 0.7% ~ 2.7%, respectively, which shows that this integration model can improve the classification accuracy and also validate its feasibility and reliability.https://ieeexplore.ieee.org/document/9231272/Bagging integrationC45 algorithminformation entropysplit information
spellingShingle Yu-Qing Song
Xu Yao
Zhe Liu
Xianbao Shen
Jingyi Mao
An Improved C4.5 Algorthm in Bagging Integration Model
IEEE Access
Bagging integration
C45 algorithm
information entropy
split information
title An Improved C4.5 Algorthm in Bagging Integration Model
title_full An Improved C4.5 Algorthm in Bagging Integration Model
title_fullStr An Improved C4.5 Algorthm in Bagging Integration Model
title_full_unstemmed An Improved C4.5 Algorthm in Bagging Integration Model
title_short An Improved C4.5 Algorthm in Bagging Integration Model
title_sort improved c4 5 algorthm in bagging integration model
topic Bagging integration
C45 algorithm
information entropy
split information
url https://ieeexplore.ieee.org/document/9231272/
work_keys_str_mv AT yuqingsong animprovedc45algorthminbaggingintegrationmodel
AT xuyao animprovedc45algorthminbaggingintegrationmodel
AT zheliu animprovedc45algorthminbaggingintegrationmodel
AT xianbaoshen animprovedc45algorthminbaggingintegrationmodel
AT jingyimao animprovedc45algorthminbaggingintegrationmodel
AT yuqingsong improvedc45algorthminbaggingintegrationmodel
AT xuyao improvedc45algorthminbaggingintegrationmodel
AT zheliu improvedc45algorthminbaggingintegrationmodel
AT xianbaoshen improvedc45algorthminbaggingintegrationmodel
AT jingyimao improvedc45algorthminbaggingintegrationmodel