A New Clustering Method Based on the Inversion Formula

Data clustering is one area of data mining that falls into the data mining class of unsupervised learning. Cluster analysis divides data into different classes by discovering the internal structure of data set objects and their relationship. This paper presented a new density clustering method based...

Full description

Bibliographic Details
Main Authors:	Mantas Lukauskas, Tomas Ruzgas
Format:	Article
Language:	English
Published:	MDPI AG 2022-07-01
Series:	Mathematics
Subjects:	artificial intelligence unsupervised machine learning clustering nonparametric density estimation inversion formula
Online Access:	https://www.mdpi.com/2227-7390/10/15/2559

_version_	1797441480914108416
author	Mantas Lukauskas Tomas Ruzgas
author_facet	Mantas Lukauskas Tomas Ruzgas
author_sort	Mantas Lukauskas
collection	DOAJ
description	Data clustering is one area of data mining that falls into the data mining class of unsupervised learning. Cluster analysis divides data into different classes by discovering the internal structure of data set objects and their relationship. This paper presented a new density clustering method based on the modified inversion formula density estimation. This new method should allow one to improve the performance and robustness of the k-means, Gaussian mixture model, and other methods. The primary process of the proposed clustering algorithm consists of three main steps. Firstly, we initialized parameters and generated a T matrix. Secondly, we estimated the densities of each point and cluster. Third, we updated mean, sigma, and phi matrices. The new method based on the inversion formula works quite well with different datasets compared with K-means, Gaussian Mixture Model, and Bayesian Gaussian Mixture model. On the other hand, new methods have limitations because this one method in the current state cannot work with higher-dimensional data (d > 15). This will be solved in the future versions of the model, detailed further in future work. Additionally, based on the results, we can see that the MIDEv2 method works the best with generated data with outliers in all datasets (0.5%, 1%, 2%, 4% outliers). The interesting point is that a new method based on the inversion formula can cluster the data even if data do not have outliers; one of the most popular, for example, is the Iris dataset.
first_indexed	2024-03-09T12:23:41Z
format	Article
id	doaj.art-64c4cc39d04346109025a9fcb3ef2a9d
institution	Directory Open Access Journal
issn	2227-7390
language	English
last_indexed	2024-03-09T12:23:41Z
publishDate	2022-07-01
publisher	MDPI AG
record_format	Article
series	Mathematics
spelling	doaj.art-64c4cc39d04346109025a9fcb3ef2a9d2023-11-30T22:37:13ZengMDPI AGMathematics2227-73902022-07-011015255910.3390/math10152559A New Clustering Method Based on the Inversion FormulaMantas Lukauskas0Tomas Ruzgas1Department of Applied Mathematics, Faculty of Mathematics and Natural Sciences, Kaunas University of Technology, 44249 Kaunas, LithuaniaDepartment of Applied Mathematics, Faculty of Mathematics and Natural Sciences, Kaunas University of Technology, 44249 Kaunas, LithuaniaData clustering is one area of data mining that falls into the data mining class of unsupervised learning. Cluster analysis divides data into different classes by discovering the internal structure of data set objects and their relationship. This paper presented a new density clustering method based on the modified inversion formula density estimation. This new method should allow one to improve the performance and robustness of the k-means, Gaussian mixture model, and other methods. The primary process of the proposed clustering algorithm consists of three main steps. Firstly, we initialized parameters and generated a T matrix. Secondly, we estimated the densities of each point and cluster. Third, we updated mean, sigma, and phi matrices. The new method based on the inversion formula works quite well with different datasets compared with K-means, Gaussian Mixture Model, and Bayesian Gaussian Mixture model. On the other hand, new methods have limitations because this one method in the current state cannot work with higher-dimensional data (d > 15). This will be solved in the future versions of the model, detailed further in future work. Additionally, based on the results, we can see that the MIDEv2 method works the best with generated data with outliers in all datasets (0.5%, 1%, 2%, 4% outliers). The interesting point is that a new method based on the inversion formula can cluster the data even if data do not have outliers; one of the most popular, for example, is the Iris dataset.https://www.mdpi.com/2227-7390/10/15/2559artificial intelligenceunsupervised machine learningclusteringnonparametric density estimationinversion formula
spellingShingle	Mantas Lukauskas Tomas Ruzgas A New Clustering Method Based on the Inversion Formula Mathematics artificial intelligence unsupervised machine learning clustering nonparametric density estimation inversion formula
title	A New Clustering Method Based on the Inversion Formula
title_full	A New Clustering Method Based on the Inversion Formula
title_fullStr	A New Clustering Method Based on the Inversion Formula
title_full_unstemmed	A New Clustering Method Based on the Inversion Formula
title_short	A New Clustering Method Based on the Inversion Formula
title_sort	new clustering method based on the inversion formula
topic	artificial intelligence unsupervised machine learning clustering nonparametric density estimation inversion formula
url	https://www.mdpi.com/2227-7390/10/15/2559
work_keys_str_mv	AT mantaslukauskas anewclusteringmethodbasedontheinversionformula AT tomasruzgas anewclusteringmethodbasedontheinversionformula AT mantaslukauskas newclusteringmethodbasedontheinversionformula AT tomasruzgas newclusteringmethodbasedontheinversionformula

A New Clustering Method Based on the Inversion Formula

Similar Items