A method for assessment of the general circulation model quality using the <i>K</i>-means clustering algorithm: a case study with GETM v2.5

<p>The model's ability to reproduce the state of the simulated object or particular feature or phenomenon is always a subject of discussion. Multidimensional model quality assessment is usually customized for the specific focus of the study and often for a limited number of locations. In...

Full description

Bibliographic Details
Main Authors:	U. Raudsepp, I. Maljutenko
Format:	Article
Language:	English
Published:	Copernicus Publications 2022-01-01
Series:	Geoscientific Model Development
Online Access:	https://gmd.copernicus.org/articles/15/535/2022/gmd-15-535-2022.pdf

_version_	1828136937296232448
author	U. Raudsepp I. Maljutenko
author_facet	U. Raudsepp I. Maljutenko
author_sort	U. Raudsepp
collection	DOAJ
description	<p>The model's ability to reproduce the state of the simulated object or particular feature or phenomenon is always a subject of discussion. Multidimensional model quality assessment is usually customized for the specific focus of the study and often for a limited number of locations. In this paper, we propose a method that provides information on the accuracy of the model in general, while all dimensional information for posterior analysis of the specific tasks is retained. The main goal of the method is to perform clustering of the multivariate model errors. The clustering is done using the <span class="inline-formula"><i>K</i></span>-means algorithm of unsupervised machine learning. In addition, the potential application of the <span class="inline-formula"><i>K</i></span>-means clustering of model errors for learning and predicting is shown. The method is tested on the 40-year simulation results of the general circulation model of the Baltic Sea. The model results are evaluated with the measurement data of temperature and salinity from more than 1 million casts by forming a two-dimensional error space and performing a clustering procedure in it. The optimal number of clusters that consist of four clusters was determined using the Elbow cluster selection criteria and based on the analysis of the different number of error clusters. In this particular model, the error cluster with good quality of the model with a bias of 0.4 <span class="inline-formula"><sup>∘</sup>C</span> (SD <span class="inline-formula">=</span> 0.8 <span class="inline-formula"><sup>∘</sup>C</span>) for temperature and 0.6 <span class="inline-formula">g kg<sup>−1</sup></span> (SD <span class="inline-formula">=</span> 0.7 <span class="inline-formula">g kg<sup>−1</sup></span>) for salinity made up 57 % of all comparison data pairs. The prediction of centroids from a limited number of randomly selected data showed that the obtained centroids gained a stability of at least 100 000 error pairs in the learning dataset.</p>
first_indexed	2024-04-11T18:12:49Z
format	Article
id	doaj.art-3d86123c71704bc2a014ab88e6a1a65b
institution	Directory Open Access Journal
issn	1991-959X 1991-9603
language	English
last_indexed	2024-04-11T18:12:49Z
publishDate	2022-01-01
publisher	Copernicus Publications
record_format	Article
series	Geoscientific Model Development
spelling	doaj.art-3d86123c71704bc2a014ab88e6a1a65b2022-12-22T04:10:03ZengCopernicus PublicationsGeoscientific Model Development1991-959X1991-96032022-01-011553555110.5194/gmd-15-535-2022A method for assessment of the general circulation model quality using the <i>K</i>-means clustering algorithm: a case study with GETM v2.5U. RaudseppI. Maljutenko<p>The model's ability to reproduce the state of the simulated object or particular feature or phenomenon is always a subject of discussion. Multidimensional model quality assessment is usually customized for the specific focus of the study and often for a limited number of locations. In this paper, we propose a method that provides information on the accuracy of the model in general, while all dimensional information for posterior analysis of the specific tasks is retained. The main goal of the method is to perform clustering of the multivariate model errors. The clustering is done using the <span class="inline-formula"><i>K</i></span>-means algorithm of unsupervised machine learning. In addition, the potential application of the <span class="inline-formula"><i>K</i></span>-means clustering of model errors for learning and predicting is shown. The method is tested on the 40-year simulation results of the general circulation model of the Baltic Sea. The model results are evaluated with the measurement data of temperature and salinity from more than 1 million casts by forming a two-dimensional error space and performing a clustering procedure in it. The optimal number of clusters that consist of four clusters was determined using the Elbow cluster selection criteria and based on the analysis of the different number of error clusters. In this particular model, the error cluster with good quality of the model with a bias of 0.4 <span class="inline-formula"><sup>∘</sup>C</span> (SD <span class="inline-formula">=</span> 0.8 <span class="inline-formula"><sup>∘</sup>C</span>) for temperature and 0.6 <span class="inline-formula">g kg<sup>−1</sup></span> (SD <span class="inline-formula">=</span> 0.7 <span class="inline-formula">g kg<sup>−1</sup></span>) for salinity made up 57 % of all comparison data pairs. The prediction of centroids from a limited number of randomly selected data showed that the obtained centroids gained a stability of at least 100 000 error pairs in the learning dataset.</p>https://gmd.copernicus.org/articles/15/535/2022/gmd-15-535-2022.pdf
spellingShingle	U. Raudsepp I. Maljutenko A method for assessment of the general circulation model quality using the <i>K</i>-means clustering algorithm: a case study with GETM v2.5 Geoscientific Model Development
title	A method for assessment of the general circulation model quality using the <i>K</i>-means clustering algorithm: a case study with GETM v2.5
title_full	A method for assessment of the general circulation model quality using the <i>K</i>-means clustering algorithm: a case study with GETM v2.5
title_fullStr	A method for assessment of the general circulation model quality using the <i>K</i>-means clustering algorithm: a case study with GETM v2.5
title_full_unstemmed	A method for assessment of the general circulation model quality using the <i>K</i>-means clustering algorithm: a case study with GETM v2.5
title_short	A method for assessment of the general circulation model quality using the <i>K</i>-means clustering algorithm: a case study with GETM v2.5
title_sort	method for assessment of the general circulation model quality using the i k i means clustering algorithm a case study with getm v2 5
url	https://gmd.copernicus.org/articles/15/535/2022/gmd-15-535-2022.pdf
work_keys_str_mv	AT uraudsepp amethodforassessmentofthegeneralcirculationmodelqualityusingtheikimeansclusteringalgorithmacasestudywithgetmv25 AT imaljutenko amethodforassessmentofthegeneralcirculationmodelqualityusingtheikimeansclusteringalgorithmacasestudywithgetmv25 AT uraudsepp methodforassessmentofthegeneralcirculationmodelqualityusingtheikimeansclusteringalgorithmacasestudywithgetmv25 AT imaljutenko methodforassessmentofthegeneralcirculationmodelqualityusingtheikimeansclusteringalgorithmacasestudywithgetmv25

A method for assessment of the general circulation model quality using the <i>K</i>-means clustering algorithm: a case study with GETM v2.5

Similar Items