A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering

Imbalanced data classification is a major challenge in the field of data mining and machine learning, and oversampling algorithms are a widespread technique for re-sampling imbalanced data. To address the problems that existing oversampling methods tend to introduce noise points and generate overlap...

Full description

Bibliographic Details
Main Authors: Jie Cao*, Yong Shi
Format: Article
Language:English
Published: Faculty of Mechanical Engineering in Slavonski Brod, Faculty of Electrical Engineering in Osijek, Faculty of Civil Engineering in Osijek 2021-01-01
Series:Tehnički Vjesnik
Subjects:
Online Access:https://hrcak.srce.hr/file/383542
_version_ 1827282212616142848
author Jie Cao*
Yong Shi
author_facet Jie Cao*
Yong Shi
author_sort Jie Cao*
collection DOAJ
description Imbalanced data classification is a major challenge in the field of data mining and machine learning, and oversampling algorithms are a widespread technique for re-sampling imbalanced data. To address the problems that existing oversampling methods tend to introduce noise points and generate overlapping instances, in this paper, we propose a novel oversampling method based on density peaks clustering. Firstly, density peaks clustering algorithm is used to cluster minority instances while screening outlier points. Secondly, sampling weights are assigned according to the size of clustered sub-clusters, and new instances are synthesized by interpolating between cluster cores and other instances of the same sub-cluster. Finally, comparative experiments are conducted on both the artificial data and KEEL datasets. The experiments validate the feasibility and effectiveness of the algorithm and improve the classification accuracy of the imbalanced data.
first_indexed 2024-04-24T09:14:20Z
format Article
id doaj.art-ed656071552d4e329cffb8585eee5a99
institution Directory Open Access Journal
issn 1330-3651
1848-6339
language English
last_indexed 2024-04-24T09:14:20Z
publishDate 2021-01-01
publisher Faculty of Mechanical Engineering in Slavonski Brod, Faculty of Electrical Engineering in Osijek, Faculty of Civil Engineering in Osijek
record_format Article
series Tehnički Vjesnik
spelling doaj.art-ed656071552d4e329cffb8585eee5a992024-04-15T17:13:21ZengFaculty of Mechanical Engineering in Slavonski Brod, Faculty of Electrical Engineering in Osijek, Faculty of Civil Engineering in OsijekTehnički Vjesnik1330-36511848-63392021-01-012861813181910.17559/TV-20210608123522A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks ClusteringJie Cao*0Yong Shi1Nanjing University of Information Science & Technology, No. 219, Ningliu Road, Nanjing, Jiangsu, China; Xuzhou University of Technology, No. 2 Lishui Road, Xuzhou, Jiangsu, ChinaNanjing University of Information Science & Technology, School of Mathematics and Statistics, No. 219, Ningliu Road, Nanjing, Jiangsu, ChinaImbalanced data classification is a major challenge in the field of data mining and machine learning, and oversampling algorithms are a widespread technique for re-sampling imbalanced data. To address the problems that existing oversampling methods tend to introduce noise points and generate overlapping instances, in this paper, we propose a novel oversampling method based on density peaks clustering. Firstly, density peaks clustering algorithm is used to cluster minority instances while screening outlier points. Secondly, sampling weights are assigned according to the size of clustered sub-clusters, and new instances are synthesized by interpolating between cluster cores and other instances of the same sub-cluster. Finally, comparative experiments are conducted on both the artificial data and KEEL datasets. The experiments validate the feasibility and effectiveness of the algorithm and improve the classification accuracy of the imbalanced data.https://hrcak.srce.hr/file/383542classificationdensity peaks clusteringimbalanced datasetsover sampling
spellingShingle Jie Cao*
Yong Shi
A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering
Tehnički Vjesnik
classification
density peaks clustering
imbalanced datasets
over sampling
title A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering
title_full A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering
title_fullStr A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering
title_full_unstemmed A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering
title_short A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering
title_sort novel oversampling method for imbalanced datasets based on density peaks clustering
topic classification
density peaks clustering
imbalanced datasets
over sampling
url https://hrcak.srce.hr/file/383542
work_keys_str_mv AT jiecao anoveloversamplingmethodforimbalanceddatasetsbasedondensitypeaksclustering
AT yongshi anoveloversamplingmethodforimbalanceddatasetsbasedondensitypeaksclustering
AT jiecao noveloversamplingmethodforimbalanceddatasetsbasedondensitypeaksclustering
AT yongshi noveloversamplingmethodforimbalanceddatasetsbasedondensitypeaksclustering