A multi-manifold learning based instance weighting and under-sampling for imbalanced data classification problems

Abstract Under-sampling is a technique to overcome imbalanced class problem, however, selecting the instances to be dropped and measuring their informativeness is an important concern. This paper tries to bring up a new point of view in this regard and exploit the structure of data to decide on the...

Full description

Bibliographic Details
Main Authors: Tayyebe Feizi, Mohammad Hossein Moattar, Hamid Tabatabaee
Format: Article
Language:English
Published: SpringerOpen 2023-10-01
Series:Journal of Big Data
Subjects:
Online Access:https://doi.org/10.1186/s40537-023-00832-2
_version_ 1797452317615718400
author Tayyebe Feizi
Mohammad Hossein Moattar
Hamid Tabatabaee
author_facet Tayyebe Feizi
Mohammad Hossein Moattar
Hamid Tabatabaee
author_sort Tayyebe Feizi
collection DOAJ
description Abstract Under-sampling is a technique to overcome imbalanced class problem, however, selecting the instances to be dropped and measuring their informativeness is an important concern. This paper tries to bring up a new point of view in this regard and exploit the structure of data to decide on the importance of the data points. For this purpose, a multi-manifold learning approach is proposed. Manifolds represent the underlying structures of data and can help extract the latent space for data distribution. However, there is no evidence that we can rely on a single manifold to extract the local neighborhood of the dataset. Therefore, this paper proposes an ensemble of manifold learning approaches and evaluates each manifold based on an information loss-based heuristic. Having computed the optimality score of each manifold, the centrality and marginality degrees of samples are computed on the manifolds and weighted by the corresponding score. A gradual elimination approach is proposed, which tries to balance the classes while avoiding a drop in the F measure on the validation dataset. The proposed method is evaluated on 22 imbalanced datasets from the KEEL and UCI repositories with different classification measures. The results of the experiments demonstrate that the proposed approach is more effective than other similar approaches and is far better than the previous approaches, especially when the imbalance ratio is very high.
first_indexed 2024-03-09T15:06:50Z
format Article
id doaj.art-85c2fedeec624ed499505d9f65c1a53a
institution Directory Open Access Journal
issn 2196-1115
language English
last_indexed 2024-03-09T15:06:50Z
publishDate 2023-10-01
publisher SpringerOpen
record_format Article
series Journal of Big Data
spelling doaj.art-85c2fedeec624ed499505d9f65c1a53a2023-11-26T13:35:38ZengSpringerOpenJournal of Big Data2196-11152023-10-0110113610.1186/s40537-023-00832-2A multi-manifold learning based instance weighting and under-sampling for imbalanced data classification problemsTayyebe Feizi0Mohammad Hossein Moattar1Hamid Tabatabaee2Department of Computer Engineering, Mashhad Branch, Islamic Azad UniversityDepartment of Computer Engineering, Mashhad Branch, Islamic Azad UniversityDepartment of Computer Engineering, Mashhad Branch, Islamic Azad UniversityAbstract Under-sampling is a technique to overcome imbalanced class problem, however, selecting the instances to be dropped and measuring their informativeness is an important concern. This paper tries to bring up a new point of view in this regard and exploit the structure of data to decide on the importance of the data points. For this purpose, a multi-manifold learning approach is proposed. Manifolds represent the underlying structures of data and can help extract the latent space for data distribution. However, there is no evidence that we can rely on a single manifold to extract the local neighborhood of the dataset. Therefore, this paper proposes an ensemble of manifold learning approaches and evaluates each manifold based on an information loss-based heuristic. Having computed the optimality score of each manifold, the centrality and marginality degrees of samples are computed on the manifolds and weighted by the corresponding score. A gradual elimination approach is proposed, which tries to balance the classes while avoiding a drop in the F measure on the validation dataset. The proposed method is evaluated on 22 imbalanced datasets from the KEEL and UCI repositories with different classification measures. The results of the experiments demonstrate that the proposed approach is more effective than other similar approaches and is far better than the previous approaches, especially when the imbalance ratio is very high.https://doi.org/10.1186/s40537-023-00832-2Imbalanced dataClassificationUnder-samplingMulti-Manifold learning
spellingShingle Tayyebe Feizi
Mohammad Hossein Moattar
Hamid Tabatabaee
A multi-manifold learning based instance weighting and under-sampling for imbalanced data classification problems
Journal of Big Data
Imbalanced data
Classification
Under-sampling
Multi-Manifold learning
title A multi-manifold learning based instance weighting and under-sampling for imbalanced data classification problems
title_full A multi-manifold learning based instance weighting and under-sampling for imbalanced data classification problems
title_fullStr A multi-manifold learning based instance weighting and under-sampling for imbalanced data classification problems
title_full_unstemmed A multi-manifold learning based instance weighting and under-sampling for imbalanced data classification problems
title_short A multi-manifold learning based instance weighting and under-sampling for imbalanced data classification problems
title_sort multi manifold learning based instance weighting and under sampling for imbalanced data classification problems
topic Imbalanced data
Classification
Under-sampling
Multi-Manifold learning
url https://doi.org/10.1186/s40537-023-00832-2
work_keys_str_mv AT tayyebefeizi amultimanifoldlearningbasedinstanceweightingandundersamplingforimbalanceddataclassificationproblems
AT mohammadhosseinmoattar amultimanifoldlearningbasedinstanceweightingandundersamplingforimbalanceddataclassificationproblems
AT hamidtabatabaee amultimanifoldlearningbasedinstanceweightingandundersamplingforimbalanceddataclassificationproblems
AT tayyebefeizi multimanifoldlearningbasedinstanceweightingandundersamplingforimbalanceddataclassificationproblems
AT mohammadhosseinmoattar multimanifoldlearningbasedinstanceweightingandundersamplingforimbalanceddataclassificationproblems
AT hamidtabatabaee multimanifoldlearningbasedinstanceweightingandundersamplingforimbalanceddataclassificationproblems