Efficient algorithm for testing goodness-of-fit for classification of high dimensional data
Let us have a sample satisfying d-dimensional Gaussian mixture model (d is supposed to be large). The problem of classification of the sample is considered. Because of large dimension it is natural to project the sample to k-dimensional (k = 1, 2, . . .) linear subspaces using projection pursuit me...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Vilnius University Press
2009-12-01
|
Series: | Lietuvos Matematikos Rinkinys |
Subjects: | |
Online Access: | https://www.journals.vu.lt/LMR/article/view/17982 |
Summary: | Let us have a sample satisfying d-dimensional Gaussian mixture model (d is supposed to be large). The problem of classification of the sample is considered. Because of large dimension it is natural to project the sample to k-dimensional (k = 1, 2, . . .) linear subspaces using projection pursuit method which gives the best selection of these subspaces. Having an estimate of the discriminant subspace we can perform classification using projected sample thus avoiding ’curse of dimensionality’. An essential step in this method is testing goodness-of-fit of the estimated d-dimensional model assuming that distribution on the complement space is standard Gaussian. We present a simple, data-driven and computationally efficient procedure for testing goodness-of-fit. The procedure is based on well-known interpretation of testing goodness-of-fit as the classification problem, a special sequential data partition procedure, randomization and resampling, elements of sequentialtesting.Monte-Carlosimulations are used to assess the performance of the procedure. |
---|---|
ISSN: | 0132-2818 2335-898X |