The Fundamental Clustering and Projection Suite (FCPS): A Dataset Collection to Test the Performance of Clustering and Data Projection Algorithms

In the context of data science, data projection and clustering are common procedures. The chosen analysis method is crucial to avoid faulty pattern recognition. It is therefore necessary to know the properties and especially the limitations of projection and clustering algorithms. This report descri...

Full description

Bibliographic Details
Main Authors: Alfred Ultsch, Jörn Lötsch
Format: Article
Language:English
Published: MDPI AG 2020-01-01
Series:Data
Subjects:
Online Access:https://www.mdpi.com/2306-5729/5/1/13
_version_ 1811303259373043712
author Alfred Ultsch
Jörn Lötsch
author_facet Alfred Ultsch
Jörn Lötsch
author_sort Alfred Ultsch
collection DOAJ
description In the context of data science, data projection and clustering are common procedures. The chosen analysis method is crucial to avoid faulty pattern recognition. It is therefore necessary to know the properties and especially the limitations of projection and clustering algorithms. This report describes a collection of datasets that are grouped together in the Fundamental Clustering and Projection Suite (FCPS). The FCPS contains 10 datasets with the names "Atom", "Chainlink", "EngyTime", "Golfball", "Hepta", "Lsun", "Target", "Tetra", "TwoDiamonds", and "WingNut". Common clustering methods occasionally identified non-existent clusters or assigned data points to the wrong clusters in the FCPS suite. Likewise, common data projection methods could only partially reproduce the data structure correctly on a two-dimensional plane. In conclusion, the FCPS dataset collection addresses general challenges for clustering and projection algorithms such as lack of linear separability, different or small inner class spacing, classes defined by data density rather than data spacing, no cluster structure at all, outliers, or classes that are in contact. This report describes a collection of datasets that are grouped together in the Fundamental Clustering and Projection Suite (FCPS). It is designed to address specific problems of structure discovery in high-dimensional spaces.
first_indexed 2024-04-13T07:43:38Z
format Article
id doaj.art-62a88b5a4f2f4394abc0882f4a0fa2a3
institution Directory Open Access Journal
issn 2306-5729
language English
last_indexed 2024-04-13T07:43:38Z
publishDate 2020-01-01
publisher MDPI AG
record_format Article
series Data
spelling doaj.art-62a88b5a4f2f4394abc0882f4a0fa2a32022-12-22T02:55:45ZengMDPI AGData2306-57292020-01-01511310.3390/data5010013data5010013The Fundamental Clustering and Projection Suite (FCPS): A Dataset Collection to Test the Performance of Clustering and Data Projection AlgorithmsAlfred Ultsch0Jörn Lötsch1DataBionics Research Institute, University of Marburg, Hans-Meerwein-Straße, 35032 Marburg, GermanyInstitute of Clinical Pharmacology, Goethe - University, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, GermanyIn the context of data science, data projection and clustering are common procedures. The chosen analysis method is crucial to avoid faulty pattern recognition. It is therefore necessary to know the properties and especially the limitations of projection and clustering algorithms. This report describes a collection of datasets that are grouped together in the Fundamental Clustering and Projection Suite (FCPS). The FCPS contains 10 datasets with the names "Atom", "Chainlink", "EngyTime", "Golfball", "Hepta", "Lsun", "Target", "Tetra", "TwoDiamonds", and "WingNut". Common clustering methods occasionally identified non-existent clusters or assigned data points to the wrong clusters in the FCPS suite. Likewise, common data projection methods could only partially reproduce the data structure correctly on a two-dimensional plane. In conclusion, the FCPS dataset collection addresses general challenges for clustering and projection algorithms such as lack of linear separability, different or small inner class spacing, classes defined by data density rather than data spacing, no cluster structure at all, outliers, or classes that are in contact. This report describes a collection of datasets that are grouped together in the Fundamental Clustering and Projection Suite (FCPS). It is designed to address specific problems of structure discovery in high-dimensional spaces.https://www.mdpi.com/2306-5729/5/1/13dataset: available as a supplementary file in this submission. link www.mdpi.com/xxx/s1.
spellingShingle Alfred Ultsch
Jörn Lötsch
The Fundamental Clustering and Projection Suite (FCPS): A Dataset Collection to Test the Performance of Clustering and Data Projection Algorithms
Data
dataset: available as a supplementary file in this submission. link www.mdpi.com/xxx/s1.
title The Fundamental Clustering and Projection Suite (FCPS): A Dataset Collection to Test the Performance of Clustering and Data Projection Algorithms
title_full The Fundamental Clustering and Projection Suite (FCPS): A Dataset Collection to Test the Performance of Clustering and Data Projection Algorithms
title_fullStr The Fundamental Clustering and Projection Suite (FCPS): A Dataset Collection to Test the Performance of Clustering and Data Projection Algorithms
title_full_unstemmed The Fundamental Clustering and Projection Suite (FCPS): A Dataset Collection to Test the Performance of Clustering and Data Projection Algorithms
title_short The Fundamental Clustering and Projection Suite (FCPS): A Dataset Collection to Test the Performance of Clustering and Data Projection Algorithms
title_sort fundamental clustering and projection suite fcps a dataset collection to test the performance of clustering and data projection algorithms
topic dataset: available as a supplementary file in this submission. link www.mdpi.com/xxx/s1.
url https://www.mdpi.com/2306-5729/5/1/13
work_keys_str_mv AT alfredultsch thefundamentalclusteringandprojectionsuitefcpsadatasetcollectiontotesttheperformanceofclusteringanddataprojectionalgorithms
AT jornlotsch thefundamentalclusteringandprojectionsuitefcpsadatasetcollectiontotesttheperformanceofclusteringanddataprojectionalgorithms
AT alfredultsch fundamentalclusteringandprojectionsuitefcpsadatasetcollectiontotesttheperformanceofclusteringanddataprojectionalgorithms
AT jornlotsch fundamentalclusteringandprojectionsuitefcpsadatasetcollectiontotesttheperformanceofclusteringanddataprojectionalgorithms