Learning Kernel Stein Discrepancy for Training Energy-Based Models

The primary challenge in unsupervised learning is training unnormalized density models and then generating similar samples. Few traditional unnormalized models know what the quality of the trained model is, as most models are evaluated by downstream tasks and often involve complex sampling processes...

Full description

Bibliographic Details
Main Authors: Lu Niu, Shaobo Li, Zhenping Li
Format: Article
Language:English
Published: MDPI AG 2023-11-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/13/22/12293
_version_ 1797460322951364608
author Lu Niu
Shaobo Li
Zhenping Li
author_facet Lu Niu
Shaobo Li
Zhenping Li
author_sort Lu Niu
collection DOAJ
description The primary challenge in unsupervised learning is training unnormalized density models and then generating similar samples. Few traditional unnormalized models know what the quality of the trained model is, as most models are evaluated by downstream tasks and often involve complex sampling processes. Kernel Stein Discrepancy (KSD), a goodness-of-fit test method, can measure the discrepancy between the generated samples and the theoretical distribution; therefore, it can be employed to measure the quality of trained models. We first demonstrate that, under certain constraints, KSD is equal to Maximum Mean Discrepancy (MMD), a two-sample test method. PT KSD GAN (Kernel Stein Discrepancy Generative Adversarial Network with a Pulling-Away Term) is produced to compel generated samples to approximate the theoretical distribution. The generator, functioning as an implicit generative model, employs KSD as loss to avoid tedious sampling processes. In contrast, the discriminator is trained to identify the data manifold, also known as an explicit energy-based model. To demonstrate the effectiveness of our approach, we undertook experiments on two-dimensional toy datasets. Our results highlight that our generator adeptly captures the accurate density distribution, while the discriminator proficiently recognizes the unnormalized approximate distribution shape. When applied to linear Independent Component Analysis datasets, the log likelihoods of PT KSD GAN improve by about 5‰ over existing methods when the data dimension is less than 30. Furthermore, our tests on image datasets reveal that the PT KSD GAN excels in navigating high-dimensional challenges, yielding authentically genuine samples.
first_indexed 2024-03-09T17:03:24Z
format Article
id doaj.art-4b86155de8f84c558b9626833dd53635
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-09T17:03:24Z
publishDate 2023-11-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-4b86155de8f84c558b9626833dd536352023-11-24T14:27:03ZengMDPI AGApplied Sciences2076-34172023-11-0113221229310.3390/app132212293Learning Kernel Stein Discrepancy for Training Energy-Based ModelsLu Niu0Shaobo Li1Zhenping Li2Chengdu Institute of Computer Applications, Chinese Academy of Sciences, Chengdu 610041, ChinaChengdu Institute of Computer Applications, Chinese Academy of Sciences, Chengdu 610041, ChinaChengdu Institute of Computer Applications, Chinese Academy of Sciences, Chengdu 610041, ChinaThe primary challenge in unsupervised learning is training unnormalized density models and then generating similar samples. Few traditional unnormalized models know what the quality of the trained model is, as most models are evaluated by downstream tasks and often involve complex sampling processes. Kernel Stein Discrepancy (KSD), a goodness-of-fit test method, can measure the discrepancy between the generated samples and the theoretical distribution; therefore, it can be employed to measure the quality of trained models. We first demonstrate that, under certain constraints, KSD is equal to Maximum Mean Discrepancy (MMD), a two-sample test method. PT KSD GAN (Kernel Stein Discrepancy Generative Adversarial Network with a Pulling-Away Term) is produced to compel generated samples to approximate the theoretical distribution. The generator, functioning as an implicit generative model, employs KSD as loss to avoid tedious sampling processes. In contrast, the discriminator is trained to identify the data manifold, also known as an explicit energy-based model. To demonstrate the effectiveness of our approach, we undertook experiments on two-dimensional toy datasets. Our results highlight that our generator adeptly captures the accurate density distribution, while the discriminator proficiently recognizes the unnormalized approximate distribution shape. When applied to linear Independent Component Analysis datasets, the log likelihoods of PT KSD GAN improve by about 5‰ over existing methods when the data dimension is less than 30. Furthermore, our tests on image datasets reveal that the PT KSD GAN excels in navigating high-dimensional challenges, yielding authentically genuine samples.https://www.mdpi.com/2076-3417/13/22/12293hypothesis testingenergy-based modelKernel Stein DiscrepancyMaximum Mean Discrepancy
spellingShingle Lu Niu
Shaobo Li
Zhenping Li
Learning Kernel Stein Discrepancy for Training Energy-Based Models
Applied Sciences
hypothesis testing
energy-based model
Kernel Stein Discrepancy
Maximum Mean Discrepancy
title Learning Kernel Stein Discrepancy for Training Energy-Based Models
title_full Learning Kernel Stein Discrepancy for Training Energy-Based Models
title_fullStr Learning Kernel Stein Discrepancy for Training Energy-Based Models
title_full_unstemmed Learning Kernel Stein Discrepancy for Training Energy-Based Models
title_short Learning Kernel Stein Discrepancy for Training Energy-Based Models
title_sort learning kernel stein discrepancy for training energy based models
topic hypothesis testing
energy-based model
Kernel Stein Discrepancy
Maximum Mean Discrepancy
url https://www.mdpi.com/2076-3417/13/22/12293
work_keys_str_mv AT luniu learningkernelsteindiscrepancyfortrainingenergybasedmodels
AT shaoboli learningkernelsteindiscrepancyfortrainingenergybasedmodels
AT zhenpingli learningkernelsteindiscrepancyfortrainingenergybasedmodels