Parameter Choice, Stability and Validity for Robust Cluster Weighted Modeling

Statistical inference based on the cluster weighted model often requires some subjective judgment from the modeler. Many features influence the final solution, such as the number of mixture components, the shape of the clusters in the explanatory variables, and the degree of heteroscedasticity of th...

Full description

Bibliographic Details
Main Authors: Andrea Cappozzo, Luis Angel García Escudero, Francesca Greselin, Agustín Mayo-Iscar
Format: Article
Language:English
Published: MDPI AG 2021-07-01
Series:Stats
Subjects:
Online Access:https://www.mdpi.com/2571-905X/4/3/36
_version_ 1827681197682065408
author Andrea Cappozzo
Luis Angel García Escudero
Francesca Greselin
Agustín Mayo-Iscar
author_facet Andrea Cappozzo
Luis Angel García Escudero
Francesca Greselin
Agustín Mayo-Iscar
author_sort Andrea Cappozzo
collection DOAJ
description Statistical inference based on the cluster weighted model often requires some subjective judgment from the modeler. Many features influence the final solution, such as the number of mixture components, the shape of the clusters in the explanatory variables, and the degree of heteroscedasticity of the errors around the regression lines. Moreover, to deal with outliers and contamination that may appear in the data, hyper-parameter values ensuring robust estimation are also needed. In principle, this freedom gives rise to a variety of “legitimate” solutions, each derived by a specific set of choices and their implications in modeling. Here we introduce a method for identifying a “set of good models” to cluster a dataset, considering the whole panorama of choices. In this way, we enable the practitioner, or the scientist who needs to cluster the data, to make an educated choice. They will be able to identify the most appropriate solutions for the purposes of their own analysis, in light of their stability and validity.
first_indexed 2024-03-10T07:12:47Z
format Article
id doaj.art-b18c1b15246f4faab532afcbf8628943
institution Directory Open Access Journal
issn 2571-905X
language English
last_indexed 2024-03-10T07:12:47Z
publishDate 2021-07-01
publisher MDPI AG
record_format Article
series Stats
spelling doaj.art-b18c1b15246f4faab532afcbf86289432023-11-22T15:18:19ZengMDPI AGStats2571-905X2021-07-014360261510.3390/stats4030036Parameter Choice, Stability and Validity for Robust Cluster Weighted ModelingAndrea Cappozzo0Luis Angel García Escudero1Francesca Greselin2Agustín Mayo-Iscar3MOX-Department of Mathematics, Politecnico di Milano, 20133 Milan, ItalyDepartamento de Estadística e Investigación Operativa, Facultad de Ciencias, Universidad de Valladolid, 47002 Villadolid, SpainDepartment of Statistics and Quantitative Methods, University of Milano-Bicocca, 20126 Milan, ItalyDepartamento de Estadística e Investigación Operativa, Facultad de Ciencias, Universidad de Valladolid, 47002 Villadolid, SpainStatistical inference based on the cluster weighted model often requires some subjective judgment from the modeler. Many features influence the final solution, such as the number of mixture components, the shape of the clusters in the explanatory variables, and the degree of heteroscedasticity of the errors around the regression lines. Moreover, to deal with outliers and contamination that may appear in the data, hyper-parameter values ensuring robust estimation are also needed. In principle, this freedom gives rise to a variety of “legitimate” solutions, each derived by a specific set of choices and their implications in modeling. Here we introduce a method for identifying a “set of good models” to cluster a dataset, considering the whole panorama of choices. In this way, we enable the practitioner, or the scientist who needs to cluster the data, to make an educated choice. They will be able to identify the most appropriate solutions for the purposes of their own analysis, in light of their stability and validity.https://www.mdpi.com/2571-905X/4/3/36cluster-weighted modelingoutlierstrimmed BICeigenvalue constraintmonitoringconstrained estimation
spellingShingle Andrea Cappozzo
Luis Angel García Escudero
Francesca Greselin
Agustín Mayo-Iscar
Parameter Choice, Stability and Validity for Robust Cluster Weighted Modeling
Stats
cluster-weighted modeling
outliers
trimmed BIC
eigenvalue constraint
monitoring
constrained estimation
title Parameter Choice, Stability and Validity for Robust Cluster Weighted Modeling
title_full Parameter Choice, Stability and Validity for Robust Cluster Weighted Modeling
title_fullStr Parameter Choice, Stability and Validity for Robust Cluster Weighted Modeling
title_full_unstemmed Parameter Choice, Stability and Validity for Robust Cluster Weighted Modeling
title_short Parameter Choice, Stability and Validity for Robust Cluster Weighted Modeling
title_sort parameter choice stability and validity for robust cluster weighted modeling
topic cluster-weighted modeling
outliers
trimmed BIC
eigenvalue constraint
monitoring
constrained estimation
url https://www.mdpi.com/2571-905X/4/3/36
work_keys_str_mv AT andreacappozzo parameterchoicestabilityandvalidityforrobustclusterweightedmodeling
AT luisangelgarciaescudero parameterchoicestabilityandvalidityforrobustclusterweightedmodeling
AT francescagreselin parameterchoicestabilityandvalidityforrobustclusterweightedmodeling
AT agustinmayoiscar parameterchoicestabilityandvalidityforrobustclusterweightedmodeling