AUC4.5: AUC-Based C4.5 Decision Tree Algorithm for Imbalanced Data Classification

This paper presents a modification of Quinlan’s C4.5 algorithm for imbalanced data classification. While the C4.5 algorithm uses the difference in information entropy to determine the goodness of a split, the proposed method, which is named AUC4.5, examines the difference in the area unde...

Full description

Bibliographic Details
Main Author: Jong-Seok Lee
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8779617/
_version_ 1811212299963203584
author Jong-Seok Lee
author_facet Jong-Seok Lee
author_sort Jong-Seok Lee
collection DOAJ
description This paper presents a modification of Quinlan’s C4.5 algorithm for imbalanced data classification. While the C4.5 algorithm uses the difference in information entropy to determine the goodness of a split, the proposed method, which is named AUC4.5, examines the difference in the area under the ROC curve (AUC) of a split. It implies that our method attempts to maximize the AUC value of a trained decision tree in order to cope with class imbalance in data. An extensive experimental study was performed on 20 real datasets from the machine learning repository at the University of California at Irvine, Irvine. The proposed AUC4.5 algorithm showed better classification than both the standard and cost-sensitive C4.5 algorithms.
first_indexed 2024-04-12T05:26:52Z
format Article
id doaj.art-805b030ac56b409c9b7ac9099d734bbe
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-12T05:26:52Z
publishDate 2019-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-805b030ac56b409c9b7ac9099d734bbe2022-12-22T03:46:15ZengIEEEIEEE Access2169-35362019-01-01710603410604210.1109/ACCESS.2019.29318658779617AUC4.5: AUC-Based C4.5 Decision Tree Algorithm for Imbalanced Data ClassificationJong-Seok Lee0https://orcid.org/0000-0001-5255-4425Department of Industrial Engineering, Sungkyunkwan University, Suwon, South KoreaThis paper presents a modification of Quinlan’s C4.5 algorithm for imbalanced data classification. While the C4.5 algorithm uses the difference in information entropy to determine the goodness of a split, the proposed method, which is named AUC4.5, examines the difference in the area under the ROC curve (AUC) of a split. It implies that our method attempts to maximize the AUC value of a trained decision tree in order to cope with class imbalance in data. An extensive experimental study was performed on 20 real datasets from the machine learning repository at the University of California at Irvine, Irvine. The proposed AUC4.5 algorithm showed better classification than both the standard and cost-sensitive C4.5 algorithms.https://ieeexplore.ieee.org/document/8779617/Area under ROC curveC4.5classificationimbalanced datatree induction
spellingShingle Jong-Seok Lee
AUC4.5: AUC-Based C4.5 Decision Tree Algorithm for Imbalanced Data Classification
IEEE Access
Area under ROC curve
C4.5
classification
imbalanced data
tree induction
title AUC4.5: AUC-Based C4.5 Decision Tree Algorithm for Imbalanced Data Classification
title_full AUC4.5: AUC-Based C4.5 Decision Tree Algorithm for Imbalanced Data Classification
title_fullStr AUC4.5: AUC-Based C4.5 Decision Tree Algorithm for Imbalanced Data Classification
title_full_unstemmed AUC4.5: AUC-Based C4.5 Decision Tree Algorithm for Imbalanced Data Classification
title_short AUC4.5: AUC-Based C4.5 Decision Tree Algorithm for Imbalanced Data Classification
title_sort auc4 5 auc based c4 5 decision tree algorithm for imbalanced data classification
topic Area under ROC curve
C4.5
classification
imbalanced data
tree induction
url https://ieeexplore.ieee.org/document/8779617/
work_keys_str_mv AT jongseoklee auc45aucbasedc45decisiontreealgorithmforimbalanceddataclassification