AUC4.5: AUC-Based C4.5 Decision Tree Algorithm for Imbalanced Data Classification
This paper presents a modification of Quinlan’s C4.5 algorithm for imbalanced data classification. While the C4.5 algorithm uses the difference in information entropy to determine the goodness of a split, the proposed method, which is named AUC4.5, examines the difference in the area unde...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2019-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8779617/ |
_version_ | 1811212299963203584 |
---|---|
author | Jong-Seok Lee |
author_facet | Jong-Seok Lee |
author_sort | Jong-Seok Lee |
collection | DOAJ |
description | This paper presents a modification of Quinlan’s C4.5 algorithm for imbalanced data classification. While the C4.5 algorithm uses the difference in information entropy to determine the goodness of a split, the proposed method, which is named AUC4.5, examines the difference in the area under the ROC curve (AUC) of a split. It implies that our method attempts to maximize the AUC value of a trained decision tree in order to cope with class imbalance in data. An extensive experimental study was performed on 20 real datasets from the machine learning repository at the University of California at Irvine, Irvine. The proposed AUC4.5 algorithm showed better classification than both the standard and cost-sensitive C4.5 algorithms. |
first_indexed | 2024-04-12T05:26:52Z |
format | Article |
id | doaj.art-805b030ac56b409c9b7ac9099d734bbe |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-04-12T05:26:52Z |
publishDate | 2019-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-805b030ac56b409c9b7ac9099d734bbe2022-12-22T03:46:15ZengIEEEIEEE Access2169-35362019-01-01710603410604210.1109/ACCESS.2019.29318658779617AUC4.5: AUC-Based C4.5 Decision Tree Algorithm for Imbalanced Data ClassificationJong-Seok Lee0https://orcid.org/0000-0001-5255-4425Department of Industrial Engineering, Sungkyunkwan University, Suwon, South KoreaThis paper presents a modification of Quinlan’s C4.5 algorithm for imbalanced data classification. While the C4.5 algorithm uses the difference in information entropy to determine the goodness of a split, the proposed method, which is named AUC4.5, examines the difference in the area under the ROC curve (AUC) of a split. It implies that our method attempts to maximize the AUC value of a trained decision tree in order to cope with class imbalance in data. An extensive experimental study was performed on 20 real datasets from the machine learning repository at the University of California at Irvine, Irvine. The proposed AUC4.5 algorithm showed better classification than both the standard and cost-sensitive C4.5 algorithms.https://ieeexplore.ieee.org/document/8779617/Area under ROC curveC4.5classificationimbalanced datatree induction |
spellingShingle | Jong-Seok Lee AUC4.5: AUC-Based C4.5 Decision Tree Algorithm for Imbalanced Data Classification IEEE Access Area under ROC curve C4.5 classification imbalanced data tree induction |
title | AUC4.5: AUC-Based C4.5 Decision Tree Algorithm for Imbalanced Data Classification |
title_full | AUC4.5: AUC-Based C4.5 Decision Tree Algorithm for Imbalanced Data Classification |
title_fullStr | AUC4.5: AUC-Based C4.5 Decision Tree Algorithm for Imbalanced Data Classification |
title_full_unstemmed | AUC4.5: AUC-Based C4.5 Decision Tree Algorithm for Imbalanced Data Classification |
title_short | AUC4.5: AUC-Based C4.5 Decision Tree Algorithm for Imbalanced Data Classification |
title_sort | auc4 5 auc based c4 5 decision tree algorithm for imbalanced data classification |
topic | Area under ROC curve C4.5 classification imbalanced data tree induction |
url | https://ieeexplore.ieee.org/document/8779617/ |
work_keys_str_mv | AT jongseoklee auc45aucbasedc45decisiontreealgorithmforimbalanceddataclassification |