A multi-manifold learning based instance weighting and under-sampling for imbalanced data classification problems
Abstract Under-sampling is a technique to overcome imbalanced class problem, however, selecting the instances to be dropped and measuring their informativeness is an important concern. This paper tries to bring up a new point of view in this regard and exploit the structure of data to decide on the...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SpringerOpen
2023-10-01
|
Series: | Journal of Big Data |
Subjects: | |
Online Access: | https://doi.org/10.1186/s40537-023-00832-2 |
_version_ | 1797452317615718400 |
---|---|
author | Tayyebe Feizi Mohammad Hossein Moattar Hamid Tabatabaee |
author_facet | Tayyebe Feizi Mohammad Hossein Moattar Hamid Tabatabaee |
author_sort | Tayyebe Feizi |
collection | DOAJ |
description | Abstract Under-sampling is a technique to overcome imbalanced class problem, however, selecting the instances to be dropped and measuring their informativeness is an important concern. This paper tries to bring up a new point of view in this regard and exploit the structure of data to decide on the importance of the data points. For this purpose, a multi-manifold learning approach is proposed. Manifolds represent the underlying structures of data and can help extract the latent space for data distribution. However, there is no evidence that we can rely on a single manifold to extract the local neighborhood of the dataset. Therefore, this paper proposes an ensemble of manifold learning approaches and evaluates each manifold based on an information loss-based heuristic. Having computed the optimality score of each manifold, the centrality and marginality degrees of samples are computed on the manifolds and weighted by the corresponding score. A gradual elimination approach is proposed, which tries to balance the classes while avoiding a drop in the F measure on the validation dataset. The proposed method is evaluated on 22 imbalanced datasets from the KEEL and UCI repositories with different classification measures. The results of the experiments demonstrate that the proposed approach is more effective than other similar approaches and is far better than the previous approaches, especially when the imbalance ratio is very high. |
first_indexed | 2024-03-09T15:06:50Z |
format | Article |
id | doaj.art-85c2fedeec624ed499505d9f65c1a53a |
institution | Directory Open Access Journal |
issn | 2196-1115 |
language | English |
last_indexed | 2024-03-09T15:06:50Z |
publishDate | 2023-10-01 |
publisher | SpringerOpen |
record_format | Article |
series | Journal of Big Data |
spelling | doaj.art-85c2fedeec624ed499505d9f65c1a53a2023-11-26T13:35:38ZengSpringerOpenJournal of Big Data2196-11152023-10-0110113610.1186/s40537-023-00832-2A multi-manifold learning based instance weighting and under-sampling for imbalanced data classification problemsTayyebe Feizi0Mohammad Hossein Moattar1Hamid Tabatabaee2Department of Computer Engineering, Mashhad Branch, Islamic Azad UniversityDepartment of Computer Engineering, Mashhad Branch, Islamic Azad UniversityDepartment of Computer Engineering, Mashhad Branch, Islamic Azad UniversityAbstract Under-sampling is a technique to overcome imbalanced class problem, however, selecting the instances to be dropped and measuring their informativeness is an important concern. This paper tries to bring up a new point of view in this regard and exploit the structure of data to decide on the importance of the data points. For this purpose, a multi-manifold learning approach is proposed. Manifolds represent the underlying structures of data and can help extract the latent space for data distribution. However, there is no evidence that we can rely on a single manifold to extract the local neighborhood of the dataset. Therefore, this paper proposes an ensemble of manifold learning approaches and evaluates each manifold based on an information loss-based heuristic. Having computed the optimality score of each manifold, the centrality and marginality degrees of samples are computed on the manifolds and weighted by the corresponding score. A gradual elimination approach is proposed, which tries to balance the classes while avoiding a drop in the F measure on the validation dataset. The proposed method is evaluated on 22 imbalanced datasets from the KEEL and UCI repositories with different classification measures. The results of the experiments demonstrate that the proposed approach is more effective than other similar approaches and is far better than the previous approaches, especially when the imbalance ratio is very high.https://doi.org/10.1186/s40537-023-00832-2Imbalanced dataClassificationUnder-samplingMulti-Manifold learning |
spellingShingle | Tayyebe Feizi Mohammad Hossein Moattar Hamid Tabatabaee A multi-manifold learning based instance weighting and under-sampling for imbalanced data classification problems Journal of Big Data Imbalanced data Classification Under-sampling Multi-Manifold learning |
title | A multi-manifold learning based instance weighting and under-sampling for imbalanced data classification problems |
title_full | A multi-manifold learning based instance weighting and under-sampling for imbalanced data classification problems |
title_fullStr | A multi-manifold learning based instance weighting and under-sampling for imbalanced data classification problems |
title_full_unstemmed | A multi-manifold learning based instance weighting and under-sampling for imbalanced data classification problems |
title_short | A multi-manifold learning based instance weighting and under-sampling for imbalanced data classification problems |
title_sort | multi manifold learning based instance weighting and under sampling for imbalanced data classification problems |
topic | Imbalanced data Classification Under-sampling Multi-Manifold learning |
url | https://doi.org/10.1186/s40537-023-00832-2 |
work_keys_str_mv | AT tayyebefeizi amultimanifoldlearningbasedinstanceweightingandundersamplingforimbalanceddataclassificationproblems AT mohammadhosseinmoattar amultimanifoldlearningbasedinstanceweightingandundersamplingforimbalanceddataclassificationproblems AT hamidtabatabaee amultimanifoldlearningbasedinstanceweightingandundersamplingforimbalanceddataclassificationproblems AT tayyebefeizi multimanifoldlearningbasedinstanceweightingandundersamplingforimbalanceddataclassificationproblems AT mohammadhosseinmoattar multimanifoldlearningbasedinstanceweightingandundersamplingforimbalanceddataclassificationproblems AT hamidtabatabaee multimanifoldlearningbasedinstanceweightingandundersamplingforimbalanceddataclassificationproblems |