The Fundamental Clustering and Projection Suite (FCPS): A Dataset Collection to Test the Performance of Clustering and Data Projection Algorithms
In the context of data science, data projection and clustering are common procedures. The chosen analysis method is crucial to avoid faulty pattern recognition. It is therefore necessary to know the properties and especially the limitations of projection and clustering algorithms. This report descri...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-01-01
|
Series: | Data |
Subjects: | |
Online Access: | https://www.mdpi.com/2306-5729/5/1/13 |
_version_ | 1811303259373043712 |
---|---|
author | Alfred Ultsch Jörn Lötsch |
author_facet | Alfred Ultsch Jörn Lötsch |
author_sort | Alfred Ultsch |
collection | DOAJ |
description | In the context of data science, data projection and clustering are common procedures. The chosen analysis method is crucial to avoid faulty pattern recognition. It is therefore necessary to know the properties and especially the limitations of projection and clustering algorithms. This report describes a collection of datasets that are grouped together in the Fundamental Clustering and Projection Suite (FCPS). The FCPS contains 10 datasets with the names "Atom", "Chainlink", "EngyTime", "Golfball", "Hepta", "Lsun", "Target", "Tetra", "TwoDiamonds", and "WingNut". Common clustering methods occasionally identified non-existent clusters or assigned data points to the wrong clusters in the FCPS suite. Likewise, common data projection methods could only partially reproduce the data structure correctly on a two-dimensional plane. In conclusion, the FCPS dataset collection addresses general challenges for clustering and projection algorithms such as lack of linear separability, different or small inner class spacing, classes defined by data density rather than data spacing, no cluster structure at all, outliers, or classes that are in contact. This report describes a collection of datasets that are grouped together in the Fundamental Clustering and Projection Suite (FCPS). It is designed to address specific problems of structure discovery in high-dimensional spaces. |
first_indexed | 2024-04-13T07:43:38Z |
format | Article |
id | doaj.art-62a88b5a4f2f4394abc0882f4a0fa2a3 |
institution | Directory Open Access Journal |
issn | 2306-5729 |
language | English |
last_indexed | 2024-04-13T07:43:38Z |
publishDate | 2020-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Data |
spelling | doaj.art-62a88b5a4f2f4394abc0882f4a0fa2a32022-12-22T02:55:45ZengMDPI AGData2306-57292020-01-01511310.3390/data5010013data5010013The Fundamental Clustering and Projection Suite (FCPS): A Dataset Collection to Test the Performance of Clustering and Data Projection AlgorithmsAlfred Ultsch0Jörn Lötsch1DataBionics Research Institute, University of Marburg, Hans-Meerwein-Straße, 35032 Marburg, GermanyInstitute of Clinical Pharmacology, Goethe - University, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, GermanyIn the context of data science, data projection and clustering are common procedures. The chosen analysis method is crucial to avoid faulty pattern recognition. It is therefore necessary to know the properties and especially the limitations of projection and clustering algorithms. This report describes a collection of datasets that are grouped together in the Fundamental Clustering and Projection Suite (FCPS). The FCPS contains 10 datasets with the names "Atom", "Chainlink", "EngyTime", "Golfball", "Hepta", "Lsun", "Target", "Tetra", "TwoDiamonds", and "WingNut". Common clustering methods occasionally identified non-existent clusters or assigned data points to the wrong clusters in the FCPS suite. Likewise, common data projection methods could only partially reproduce the data structure correctly on a two-dimensional plane. In conclusion, the FCPS dataset collection addresses general challenges for clustering and projection algorithms such as lack of linear separability, different or small inner class spacing, classes defined by data density rather than data spacing, no cluster structure at all, outliers, or classes that are in contact. This report describes a collection of datasets that are grouped together in the Fundamental Clustering and Projection Suite (FCPS). It is designed to address specific problems of structure discovery in high-dimensional spaces.https://www.mdpi.com/2306-5729/5/1/13dataset: available as a supplementary file in this submission. link www.mdpi.com/xxx/s1. |
spellingShingle | Alfred Ultsch Jörn Lötsch The Fundamental Clustering and Projection Suite (FCPS): A Dataset Collection to Test the Performance of Clustering and Data Projection Algorithms Data dataset: available as a supplementary file in this submission. link www.mdpi.com/xxx/s1. |
title | The Fundamental Clustering and Projection Suite (FCPS): A Dataset Collection to Test the Performance of Clustering and Data Projection Algorithms |
title_full | The Fundamental Clustering and Projection Suite (FCPS): A Dataset Collection to Test the Performance of Clustering and Data Projection Algorithms |
title_fullStr | The Fundamental Clustering and Projection Suite (FCPS): A Dataset Collection to Test the Performance of Clustering and Data Projection Algorithms |
title_full_unstemmed | The Fundamental Clustering and Projection Suite (FCPS): A Dataset Collection to Test the Performance of Clustering and Data Projection Algorithms |
title_short | The Fundamental Clustering and Projection Suite (FCPS): A Dataset Collection to Test the Performance of Clustering and Data Projection Algorithms |
title_sort | fundamental clustering and projection suite fcps a dataset collection to test the performance of clustering and data projection algorithms |
topic | dataset: available as a supplementary file in this submission. link www.mdpi.com/xxx/s1. |
url | https://www.mdpi.com/2306-5729/5/1/13 |
work_keys_str_mv | AT alfredultsch thefundamentalclusteringandprojectionsuitefcpsadatasetcollectiontotesttheperformanceofclusteringanddataprojectionalgorithms AT jornlotsch thefundamentalclusteringandprojectionsuitefcpsadatasetcollectiontotesttheperformanceofclusteringanddataprojectionalgorithms AT alfredultsch fundamentalclusteringandprojectionsuitefcpsadatasetcollectiontotesttheperformanceofclusteringanddataprojectionalgorithms AT jornlotsch fundamentalclusteringandprojectionsuitefcpsadatasetcollectiontotesttheperformanceofclusteringanddataprojectionalgorithms |