An Improved C4.5 Algorthm in Bagging Integration Model
The C4.5 algorithm has three shortcomings: the wide range of candidate segmentation threshold sequences for continuous attributes, the comprehensive influence of different attributes and local subsets under the same attribute, and the inter-attribute redundancy. When dealing with continuous attribut...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9231272/ |
_version_ | 1811209874197970944 |
---|---|
author | Yu-Qing Song Xu Yao Zhe Liu Xianbao Shen Jingyi Mao |
author_facet | Yu-Qing Song Xu Yao Zhe Liu Xianbao Shen Jingyi Mao |
author_sort | Yu-Qing Song |
collection | DOAJ |
description | The C4.5 algorithm has three shortcomings: the wide range of candidate segmentation threshold sequences for continuous attributes, the comprehensive influence of different attributes and local subsets under the same attribute, and the inter-attribute redundancy. When dealing with continuous attributes, sampling and threshold supplement processing near the transition boundary of the attribute interval corresponding to the adjacent different categories are performed for narrowing the range of candate segmentation threshold sequences. By adding standardizing Euclidean distance of the attribute global and local factors to represent attribute weight, the calculation of C4.5 information gain is otpimized. And averaging Gini index of other attributes and adding correction factor, the influence of redundancy between attributes is greatly decreased. The overall average improvement range of the base classifier and the bagging integration classifier is 0.6%~2.1% and 0.7% ~ 2.7%, respectively, which shows that this integration model can improve the classification accuracy and also validate its feasibility and reliability. |
first_indexed | 2024-04-12T04:46:19Z |
format | Article |
id | doaj.art-f4598651786b4567ae0bde63441d990d |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-04-12T04:46:19Z |
publishDate | 2020-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-f4598651786b4567ae0bde63441d990d2022-12-22T03:47:30ZengIEEEIEEE Access2169-35362020-01-01820686620687510.1109/ACCESS.2020.30322919231272An Improved C4.5 Algorthm in Bagging Integration ModelYu-Qing Song0Xu Yao1https://orcid.org/0000-0001-8136-9528Zhe Liu2https://orcid.org/0000-0002-1197-0390Xianbao Shen3Jingyi Mao4School of Computer Science and Telecommunication, Jiangsu University, Zhenjiang, ChinaSchool of Computer Science and Telecommunication, Jiangsu University, Zhenjiang, ChinaSchool of Computer Science and Telecommunication, Jiangsu University, Zhenjiang, ChinaSchool of Computer Science and Telecommunication, Jiangsu University, Zhenjiang, ChinaSchool of Computer Science and Telecommunication, Jiangsu University, Zhenjiang, ChinaThe C4.5 algorithm has three shortcomings: the wide range of candidate segmentation threshold sequences for continuous attributes, the comprehensive influence of different attributes and local subsets under the same attribute, and the inter-attribute redundancy. When dealing with continuous attributes, sampling and threshold supplement processing near the transition boundary of the attribute interval corresponding to the adjacent different categories are performed for narrowing the range of candate segmentation threshold sequences. By adding standardizing Euclidean distance of the attribute global and local factors to represent attribute weight, the calculation of C4.5 information gain is otpimized. And averaging Gini index of other attributes and adding correction factor, the influence of redundancy between attributes is greatly decreased. The overall average improvement range of the base classifier and the bagging integration classifier is 0.6%~2.1% and 0.7% ~ 2.7%, respectively, which shows that this integration model can improve the classification accuracy and also validate its feasibility and reliability.https://ieeexplore.ieee.org/document/9231272/Bagging integrationC45 algorithminformation entropysplit information |
spellingShingle | Yu-Qing Song Xu Yao Zhe Liu Xianbao Shen Jingyi Mao An Improved C4.5 Algorthm in Bagging Integration Model IEEE Access Bagging integration C45 algorithm information entropy split information |
title | An Improved C4.5 Algorthm in Bagging Integration Model |
title_full | An Improved C4.5 Algorthm in Bagging Integration Model |
title_fullStr | An Improved C4.5 Algorthm in Bagging Integration Model |
title_full_unstemmed | An Improved C4.5 Algorthm in Bagging Integration Model |
title_short | An Improved C4.5 Algorthm in Bagging Integration Model |
title_sort | improved c4 5 algorthm in bagging integration model |
topic | Bagging integration C45 algorithm information entropy split information |
url | https://ieeexplore.ieee.org/document/9231272/ |
work_keys_str_mv | AT yuqingsong animprovedc45algorthminbaggingintegrationmodel AT xuyao animprovedc45algorthminbaggingintegrationmodel AT zheliu animprovedc45algorthminbaggingintegrationmodel AT xianbaoshen animprovedc45algorthminbaggingintegrationmodel AT jingyimao animprovedc45algorthminbaggingintegrationmodel AT yuqingsong improvedc45algorthminbaggingintegrationmodel AT xuyao improvedc45algorthminbaggingintegrationmodel AT zheliu improvedc45algorthminbaggingintegrationmodel AT xianbaoshen improvedc45algorthminbaggingintegrationmodel AT jingyimao improvedc45algorthminbaggingintegrationmodel |