An Approach for Mining Imbalanced Datasets Combining Specialized Oversampling and Undersampling Methods
The paper proposes an approach for mining imbalanced datasets combining specialized oversampling and undersampling methods. The oversampling part produces a set of non-dominated synthetic examples using two, possibly conflicting, criteria including classification potential and the distance from the...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2023-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10339319/ |
_version_ | 1827586142166319104 |
---|---|
author | Joanna Jedrzejowicz Piotr Jedrzejowicz |
author_facet | Joanna Jedrzejowicz Piotr Jedrzejowicz |
author_sort | Joanna Jedrzejowicz |
collection | DOAJ |
description | The paper proposes an approach for mining imbalanced datasets combining specialized oversampling and undersampling methods. The oversampling part produces a set of non-dominated synthetic examples using two, possibly conflicting, criteria including classification potential and the distance from the borderline between minority and majority distances. The undersampling part is used to remove from the majority class examples that are likely to cause mistakes and disturbances in the process of mining. To validate the approach an extensive computational experiment has been carried. Performance of the proposed approach has been compared with that of several leading algorithms proposed for balancing minority and majority datasets. To assure fairness of comparisons a singular learner based on Gene Expression Programming (GEP) has been used in all cases. Experiment results confirmed that the proposed approach outperforms other methods investigated in the experiment. |
first_indexed | 2024-03-08T23:57:45Z |
format | Article |
id | doaj.art-6afd0010753f42e69ee4e696c2c234f7 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-03-08T23:57:45Z |
publishDate | 2023-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-6afd0010753f42e69ee4e696c2c234f72023-12-13T00:01:07ZengIEEEIEEE Access2169-35362023-01-011113678213679210.1109/ACCESS.2023.333912410339319An Approach for Mining Imbalanced Datasets Combining Specialized Oversampling and Undersampling MethodsJoanna Jedrzejowicz0https://orcid.org/0000-0003-4979-5476Piotr Jedrzejowicz1https://orcid.org/0000-0001-6104-1381Institute of Informatics, Faculty of Mathematics, Physics and Informatics, University of Gdañsk, Gdañsk, PolandDepartment of Information Systems, Gdynia Maritime University, Gdynia, PolandThe paper proposes an approach for mining imbalanced datasets combining specialized oversampling and undersampling methods. The oversampling part produces a set of non-dominated synthetic examples using two, possibly conflicting, criteria including classification potential and the distance from the borderline between minority and majority distances. The undersampling part is used to remove from the majority class examples that are likely to cause mistakes and disturbances in the process of mining. To validate the approach an extensive computational experiment has been carried. Performance of the proposed approach has been compared with that of several leading algorithms proposed for balancing minority and majority datasets. To assure fairness of comparisons a singular learner based on Gene Expression Programming (GEP) has been used in all cases. Experiment results confirmed that the proposed approach outperforms other methods investigated in the experiment.https://ieeexplore.ieee.org/document/10339319/Dominance relationgene expression programmingimbalanced datasetsoversamplingundersampling |
spellingShingle | Joanna Jedrzejowicz Piotr Jedrzejowicz An Approach for Mining Imbalanced Datasets Combining Specialized Oversampling and Undersampling Methods IEEE Access Dominance relation gene expression programming imbalanced datasets oversampling undersampling |
title | An Approach for Mining Imbalanced Datasets Combining Specialized Oversampling and Undersampling Methods |
title_full | An Approach for Mining Imbalanced Datasets Combining Specialized Oversampling and Undersampling Methods |
title_fullStr | An Approach for Mining Imbalanced Datasets Combining Specialized Oversampling and Undersampling Methods |
title_full_unstemmed | An Approach for Mining Imbalanced Datasets Combining Specialized Oversampling and Undersampling Methods |
title_short | An Approach for Mining Imbalanced Datasets Combining Specialized Oversampling and Undersampling Methods |
title_sort | approach for mining imbalanced datasets combining specialized oversampling and undersampling methods |
topic | Dominance relation gene expression programming imbalanced datasets oversampling undersampling |
url | https://ieeexplore.ieee.org/document/10339319/ |
work_keys_str_mv | AT joannajedrzejowicz anapproachforminingimbalanceddatasetscombiningspecializedoversamplingandundersamplingmethods AT piotrjedrzejowicz anapproachforminingimbalanceddatasetscombiningspecializedoversamplingandundersamplingmethods AT joannajedrzejowicz approachforminingimbalanceddatasetscombiningspecializedoversamplingandundersamplingmethods AT piotrjedrzejowicz approachforminingimbalanceddatasetscombiningspecializedoversamplingandundersamplingmethods |