Structural <i>k</i>-means (S <i>k</i>-means) and clustering uncertainty evaluation framework (CUEF) for mining climate data
<p>Dramatic increases in climate data underlie a gradual paradigm shift in knowledge acquisition methods from physically based models to data-based mining approaches. One of the most popular data clustering/mining techniques is <span class="inline-formula"><i>k</i>...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Copernicus Publications
2023-04-01
|
Series: | Geoscientific Model Development |
Online Access: | https://gmd.copernicus.org/articles/16/2215/2023/gmd-16-2215-2023.pdf |
Summary: | <p>Dramatic increases in climate data underlie a gradual
paradigm shift in knowledge acquisition methods from physically based models
to data-based mining approaches. One of the most popular data clustering/mining techniques is <span class="inline-formula"><i>k</i></span>-means, and it has been used to
detect hidden patterns in climate systems; <span class="inline-formula"><i>k</i></span>-means is established based on distance metrics for
pattern recognition, which is relatively ineffective when dealing with “structured” data, that is,
data in time and space domains, which are dominant in climate science. Here, we propose (i) a novel structural-similarity-recognition-based <span class="inline-formula"><i>k</i></span>-means algorithm called structural <span class="inline-formula"><i>k</i></span>-means or S <span class="inline-formula"><i>k</i></span>-means for
climate data mining and (ii) a new clustering uncertainty representation/evaluation framework based on the information entropy concept. We
demonstrate that the novel S <span class="inline-formula"><i>k</i></span>-means could provide higher-quality clustering
outcomes in terms of general silhouette analysis, although it requires
higher computational resources compared with conventional algorithms. The
results are consistent with different demonstration problem settings using
different types of input data, including two-dimensional weather patterns,
historical climate change in terms of time series, and tropical cyclone
paths. Additionally, by quantifying the uncertainty underlying the
clustering outcomes we, for the first time, evaluated the “meaningfulness”
of applying a given clustering algorithm for a given dataset. We expect that
this study will constitute a new standard of <span class="inline-formula"><i>k</i></span>-means clustering with
“structural” input data, as well as a new framework for uncertainty
representation/evaluation of clustering algorithms for (but not limited to)
climate science.</p> |
---|---|
ISSN: | 1991-959X 1991-9603 |