An Approach for Mining Imbalanced Datasets Combining Specialized Oversampling and Undersampling Methods

The paper proposes an approach for mining imbalanced datasets combining specialized oversampling and undersampling methods. The oversampling part produces a set of non-dominated synthetic examples using two, possibly conflicting, criteria including classification potential and the distance from the...

Full description

Bibliographic Details
Main Authors: Joanna Jedrzejowicz, Piotr Jedrzejowicz
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10339319/
_version_ 1827586142166319104
author Joanna Jedrzejowicz
Piotr Jedrzejowicz
author_facet Joanna Jedrzejowicz
Piotr Jedrzejowicz
author_sort Joanna Jedrzejowicz
collection DOAJ
description The paper proposes an approach for mining imbalanced datasets combining specialized oversampling and undersampling methods. The oversampling part produces a set of non-dominated synthetic examples using two, possibly conflicting, criteria including classification potential and the distance from the borderline between minority and majority distances. The undersampling part is used to remove from the majority class examples that are likely to cause mistakes and disturbances in the process of mining. To validate the approach an extensive computational experiment has been carried. Performance of the proposed approach has been compared with that of several leading algorithms proposed for balancing minority and majority datasets. To assure fairness of comparisons a singular learner based on Gene Expression Programming (GEP) has been used in all cases. Experiment results confirmed that the proposed approach outperforms other methods investigated in the experiment.
first_indexed 2024-03-08T23:57:45Z
format Article
id doaj.art-6afd0010753f42e69ee4e696c2c234f7
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-03-08T23:57:45Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-6afd0010753f42e69ee4e696c2c234f72023-12-13T00:01:07ZengIEEEIEEE Access2169-35362023-01-011113678213679210.1109/ACCESS.2023.333912410339319An Approach for Mining Imbalanced Datasets Combining Specialized Oversampling and Undersampling MethodsJoanna Jedrzejowicz0https://orcid.org/0000-0003-4979-5476Piotr Jedrzejowicz1https://orcid.org/0000-0001-6104-1381Institute of Informatics, Faculty of Mathematics, Physics and Informatics, University of Gdañsk, Gdañsk, PolandDepartment of Information Systems, Gdynia Maritime University, Gdynia, PolandThe paper proposes an approach for mining imbalanced datasets combining specialized oversampling and undersampling methods. The oversampling part produces a set of non-dominated synthetic examples using two, possibly conflicting, criteria including classification potential and the distance from the borderline between minority and majority distances. The undersampling part is used to remove from the majority class examples that are likely to cause mistakes and disturbances in the process of mining. To validate the approach an extensive computational experiment has been carried. Performance of the proposed approach has been compared with that of several leading algorithms proposed for balancing minority and majority datasets. To assure fairness of comparisons a singular learner based on Gene Expression Programming (GEP) has been used in all cases. Experiment results confirmed that the proposed approach outperforms other methods investigated in the experiment.https://ieeexplore.ieee.org/document/10339319/Dominance relationgene expression programmingimbalanced datasetsoversamplingundersampling
spellingShingle Joanna Jedrzejowicz
Piotr Jedrzejowicz
An Approach for Mining Imbalanced Datasets Combining Specialized Oversampling and Undersampling Methods
IEEE Access
Dominance relation
gene expression programming
imbalanced datasets
oversampling
undersampling
title An Approach for Mining Imbalanced Datasets Combining Specialized Oversampling and Undersampling Methods
title_full An Approach for Mining Imbalanced Datasets Combining Specialized Oversampling and Undersampling Methods
title_fullStr An Approach for Mining Imbalanced Datasets Combining Specialized Oversampling and Undersampling Methods
title_full_unstemmed An Approach for Mining Imbalanced Datasets Combining Specialized Oversampling and Undersampling Methods
title_short An Approach for Mining Imbalanced Datasets Combining Specialized Oversampling and Undersampling Methods
title_sort approach for mining imbalanced datasets combining specialized oversampling and undersampling methods
topic Dominance relation
gene expression programming
imbalanced datasets
oversampling
undersampling
url https://ieeexplore.ieee.org/document/10339319/
work_keys_str_mv AT joannajedrzejowicz anapproachforminingimbalanceddatasetscombiningspecializedoversamplingandundersamplingmethods
AT piotrjedrzejowicz anapproachforminingimbalanceddatasetscombiningspecializedoversamplingandundersamplingmethods
AT joannajedrzejowicz approachforminingimbalanceddatasetscombiningspecializedoversamplingandundersamplingmethods
AT piotrjedrzejowicz approachforminingimbalanceddatasetscombiningspecializedoversamplingandundersamplingmethods