A Witness Function Based Construction of Discriminative Models Using Hermite Polynomials

In machine learning, we are given a dataset of the form {(xj,yj)}j=1M, drawn as i.i.d. samples from an unknown probability distribution μ; the marginal distribution for the xj's being μ*, and the marginals of the kth class μk*(x) possibly overlapping. We address the problem of detecting, with a...

Full description

Bibliographic Details
Main Authors: Hrushikesh N. Mhaskar, Xiuyuan Cheng, Alexander Cloninger
Format: Article
Language:English
Published: Frontiers Media S.A. 2020-08-01
Series:Frontiers in Applied Mathematics and Statistics
Subjects:
Online Access:https://www.frontiersin.org/article/10.3389/fams.2020.00031/full
_version_ 1828879759687286784
author Hrushikesh N. Mhaskar
Xiuyuan Cheng
Alexander Cloninger
author_facet Hrushikesh N. Mhaskar
Xiuyuan Cheng
Alexander Cloninger
author_sort Hrushikesh N. Mhaskar
collection DOAJ
description In machine learning, we are given a dataset of the form {(xj,yj)}j=1M, drawn as i.i.d. samples from an unknown probability distribution μ; the marginal distribution for the xj's being μ*, and the marginals of the kth class μk*(x) possibly overlapping. We address the problem of detecting, with a high degree of certainty, for which x we have μk*(x)>μi*(x) for all i ≠ k. We propose that rather than using a positive kernel such as the Gaussian for estimation of these measures, using a non-positive kernel that preserves a large number of moments of these measures yields an optimal approximation. We use multi-variate Hermite polynomials for this purpose, and prove optimal and local approximation results in a supremum norm in a probabilistic sense. Together with a permutation test developed with the same kernel, we prove that the kernel estimator serves as a “witness function” in classification problems. Thus, if the value of this estimator at a point x exceeds a certain threshold, then the point is reliably in a certain class. This approach can be used to modify pretrained algorithms, such as neural networks or nonlinear dimension reduction techniques, to identify in-class vs out-of-class regions for the purposes of generative models, classification uncertainty, or finding robust centroids. This fact is demonstrated in a number of real world data sets including MNIST, CIFAR10, Science News documents, and LaLonde data sets.
first_indexed 2024-12-13T09:34:43Z
format Article
id doaj.art-38727609aee54a91b59be8b7ac3eaa94
institution Directory Open Access Journal
issn 2297-4687
language English
last_indexed 2024-12-13T09:34:43Z
publishDate 2020-08-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Applied Mathematics and Statistics
spelling doaj.art-38727609aee54a91b59be8b7ac3eaa942022-12-21T23:52:24ZengFrontiers Media S.A.Frontiers in Applied Mathematics and Statistics2297-46872020-08-01610.3389/fams.2020.00031564492A Witness Function Based Construction of Discriminative Models Using Hermite PolynomialsHrushikesh N. Mhaskar0Xiuyuan Cheng1Alexander Cloninger2Institute of Mathematical Sciences, Claremont Graduate University, Claremont, CA, United StatesDepartment of Mathematics, Duke University, Durham, NC, United StatesDepartment of Mathematics and Halicioglu Data Science Institute, University of California, San Diego, San Diego, CA, United StatesIn machine learning, we are given a dataset of the form {(xj,yj)}j=1M, drawn as i.i.d. samples from an unknown probability distribution μ; the marginal distribution for the xj's being μ*, and the marginals of the kth class μk*(x) possibly overlapping. We address the problem of detecting, with a high degree of certainty, for which x we have μk*(x)>μi*(x) for all i ≠ k. We propose that rather than using a positive kernel such as the Gaussian for estimation of these measures, using a non-positive kernel that preserves a large number of moments of these measures yields an optimal approximation. We use multi-variate Hermite polynomials for this purpose, and prove optimal and local approximation results in a supremum norm in a probabilistic sense. Together with a permutation test developed with the same kernel, we prove that the kernel estimator serves as a “witness function” in classification problems. Thus, if the value of this estimator at a point x exceeds a certain threshold, then the point is reliably in a certain class. This approach can be used to modify pretrained algorithms, such as neural networks or nonlinear dimension reduction techniques, to identify in-class vs out-of-class regions for the purposes of generative models, classification uncertainty, or finding robust centroids. This fact is demonstrated in a number of real world data sets including MNIST, CIFAR10, Science News documents, and LaLonde data sets.https://www.frontiersin.org/article/10.3389/fams.2020.00031/fullgenerative modeldiscriminative modelprobability estimationHermite functionswitness function68Q32
spellingShingle Hrushikesh N. Mhaskar
Xiuyuan Cheng
Alexander Cloninger
A Witness Function Based Construction of Discriminative Models Using Hermite Polynomials
Frontiers in Applied Mathematics and Statistics
generative model
discriminative model
probability estimation
Hermite functions
witness function
68Q32
title A Witness Function Based Construction of Discriminative Models Using Hermite Polynomials
title_full A Witness Function Based Construction of Discriminative Models Using Hermite Polynomials
title_fullStr A Witness Function Based Construction of Discriminative Models Using Hermite Polynomials
title_full_unstemmed A Witness Function Based Construction of Discriminative Models Using Hermite Polynomials
title_short A Witness Function Based Construction of Discriminative Models Using Hermite Polynomials
title_sort witness function based construction of discriminative models using hermite polynomials
topic generative model
discriminative model
probability estimation
Hermite functions
witness function
68Q32
url https://www.frontiersin.org/article/10.3389/fams.2020.00031/full
work_keys_str_mv AT hrushikeshnmhaskar awitnessfunctionbasedconstructionofdiscriminativemodelsusinghermitepolynomials
AT xiuyuancheng awitnessfunctionbasedconstructionofdiscriminativemodelsusinghermitepolynomials
AT alexandercloninger awitnessfunctionbasedconstructionofdiscriminativemodelsusinghermitepolynomials
AT hrushikeshnmhaskar witnessfunctionbasedconstructionofdiscriminativemodelsusinghermitepolynomials
AT xiuyuancheng witnessfunctionbasedconstructionofdiscriminativemodelsusinghermitepolynomials
AT alexandercloninger witnessfunctionbasedconstructionofdiscriminativemodelsusinghermitepolynomials