A Witness Function Based Construction of Discriminative Models Using Hermite Polynomials
In machine learning, we are given a dataset of the form {(xj,yj)}j=1M, drawn as i.i.d. samples from an unknown probability distribution μ; the marginal distribution for the xj's being μ*, and the marginals of the kth class μk*(x) possibly overlapping. We address the problem of detecting, with a...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2020-08-01
|
Series: | Frontiers in Applied Mathematics and Statistics |
Subjects: | |
Online Access: | https://www.frontiersin.org/article/10.3389/fams.2020.00031/full |
_version_ | 1828879759687286784 |
---|---|
author | Hrushikesh N. Mhaskar Xiuyuan Cheng Alexander Cloninger |
author_facet | Hrushikesh N. Mhaskar Xiuyuan Cheng Alexander Cloninger |
author_sort | Hrushikesh N. Mhaskar |
collection | DOAJ |
description | In machine learning, we are given a dataset of the form {(xj,yj)}j=1M, drawn as i.i.d. samples from an unknown probability distribution μ; the marginal distribution for the xj's being μ*, and the marginals of the kth class μk*(x) possibly overlapping. We address the problem of detecting, with a high degree of certainty, for which x we have μk*(x)>μi*(x) for all i ≠ k. We propose that rather than using a positive kernel such as the Gaussian for estimation of these measures, using a non-positive kernel that preserves a large number of moments of these measures yields an optimal approximation. We use multi-variate Hermite polynomials for this purpose, and prove optimal and local approximation results in a supremum norm in a probabilistic sense. Together with a permutation test developed with the same kernel, we prove that the kernel estimator serves as a “witness function” in classification problems. Thus, if the value of this estimator at a point x exceeds a certain threshold, then the point is reliably in a certain class. This approach can be used to modify pretrained algorithms, such as neural networks or nonlinear dimension reduction techniques, to identify in-class vs out-of-class regions for the purposes of generative models, classification uncertainty, or finding robust centroids. This fact is demonstrated in a number of real world data sets including MNIST, CIFAR10, Science News documents, and LaLonde data sets. |
first_indexed | 2024-12-13T09:34:43Z |
format | Article |
id | doaj.art-38727609aee54a91b59be8b7ac3eaa94 |
institution | Directory Open Access Journal |
issn | 2297-4687 |
language | English |
last_indexed | 2024-12-13T09:34:43Z |
publishDate | 2020-08-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Applied Mathematics and Statistics |
spelling | doaj.art-38727609aee54a91b59be8b7ac3eaa942022-12-21T23:52:24ZengFrontiers Media S.A.Frontiers in Applied Mathematics and Statistics2297-46872020-08-01610.3389/fams.2020.00031564492A Witness Function Based Construction of Discriminative Models Using Hermite PolynomialsHrushikesh N. Mhaskar0Xiuyuan Cheng1Alexander Cloninger2Institute of Mathematical Sciences, Claremont Graduate University, Claremont, CA, United StatesDepartment of Mathematics, Duke University, Durham, NC, United StatesDepartment of Mathematics and Halicioglu Data Science Institute, University of California, San Diego, San Diego, CA, United StatesIn machine learning, we are given a dataset of the form {(xj,yj)}j=1M, drawn as i.i.d. samples from an unknown probability distribution μ; the marginal distribution for the xj's being μ*, and the marginals of the kth class μk*(x) possibly overlapping. We address the problem of detecting, with a high degree of certainty, for which x we have μk*(x)>μi*(x) for all i ≠ k. We propose that rather than using a positive kernel such as the Gaussian for estimation of these measures, using a non-positive kernel that preserves a large number of moments of these measures yields an optimal approximation. We use multi-variate Hermite polynomials for this purpose, and prove optimal and local approximation results in a supremum norm in a probabilistic sense. Together with a permutation test developed with the same kernel, we prove that the kernel estimator serves as a “witness function” in classification problems. Thus, if the value of this estimator at a point x exceeds a certain threshold, then the point is reliably in a certain class. This approach can be used to modify pretrained algorithms, such as neural networks or nonlinear dimension reduction techniques, to identify in-class vs out-of-class regions for the purposes of generative models, classification uncertainty, or finding robust centroids. This fact is demonstrated in a number of real world data sets including MNIST, CIFAR10, Science News documents, and LaLonde data sets.https://www.frontiersin.org/article/10.3389/fams.2020.00031/fullgenerative modeldiscriminative modelprobability estimationHermite functionswitness function68Q32 |
spellingShingle | Hrushikesh N. Mhaskar Xiuyuan Cheng Alexander Cloninger A Witness Function Based Construction of Discriminative Models Using Hermite Polynomials Frontiers in Applied Mathematics and Statistics generative model discriminative model probability estimation Hermite functions witness function 68Q32 |
title | A Witness Function Based Construction of Discriminative Models Using Hermite Polynomials |
title_full | A Witness Function Based Construction of Discriminative Models Using Hermite Polynomials |
title_fullStr | A Witness Function Based Construction of Discriminative Models Using Hermite Polynomials |
title_full_unstemmed | A Witness Function Based Construction of Discriminative Models Using Hermite Polynomials |
title_short | A Witness Function Based Construction of Discriminative Models Using Hermite Polynomials |
title_sort | witness function based construction of discriminative models using hermite polynomials |
topic | generative model discriminative model probability estimation Hermite functions witness function 68Q32 |
url | https://www.frontiersin.org/article/10.3389/fams.2020.00031/full |
work_keys_str_mv | AT hrushikeshnmhaskar awitnessfunctionbasedconstructionofdiscriminativemodelsusinghermitepolynomials AT xiuyuancheng awitnessfunctionbasedconstructionofdiscriminativemodelsusinghermitepolynomials AT alexandercloninger awitnessfunctionbasedconstructionofdiscriminativemodelsusinghermitepolynomials AT hrushikeshnmhaskar witnessfunctionbasedconstructionofdiscriminativemodelsusinghermitepolynomials AT xiuyuancheng witnessfunctionbasedconstructionofdiscriminativemodelsusinghermitepolynomials AT alexandercloninger witnessfunctionbasedconstructionofdiscriminativemodelsusinghermitepolynomials |