Efficient algorithm for testing goodness-of-fit for classification of high dimensional data

Let us have a sample satisfying d-dimensional Gaussian mixture model (d is supposed to be large). The problem of classification of the sample is considered. Because of large dimension it is natural to project the sample to k-dimensional (k = 1,  2, . . .) linear subspaces using projection pursuit me...

Full description

Bibliographic Details
Main Author: Gintautas Jakimauskas
Format: Article
Language:English
Published: Vilnius University Press 2009-12-01
Series:Lietuvos Matematikos Rinkinys
Subjects:
Online Access:https://www.journals.vu.lt/LMR/article/view/17982
Description
Summary:Let us have a sample satisfying d-dimensional Gaussian mixture model (d is supposed to be large). The problem of classification of the sample is considered. Because of large dimension it is natural to project the sample to k-dimensional (k = 1,  2, . . .) linear subspaces using projection pursuit method which gives the best selection of these subspaces. Having an estimate of the discriminant subspace we can perform classification using projected sample thus avoiding ’curse of dimensionality’.  An essential step in this method is testing goodness-of-fit of the estimated d-dimensional model assuming that distribution on the complement space is standard Gaussian. We present a simple, data-driven and computationally efficient procedure for testing goodness-of-fit. The procedure is based on well-known interpretation of testing goodness-of-fit as the classification problem, a special sequential data partition procedure, randomization and resampling, elements of sequentialtesting.Monte-Carlosimulations are used to assess the performance of the procedure.
ISSN:0132-2818
2335-898X