Generative modelling: addressing open problems in model misspecification and differential privacy

<p>Generative modelling has become a popular application of artificial intelligence. Model performance can, however, be impacted negatively when the generative model is misspecified, or when the generative model estimator is modified to adhere to a privacy notion such as differential privacy....

Full description

Bibliographic Details
Main Author: Ghalebikesabi, S
Other Authors: Holmes, C
Format: Thesis
Language:English
Published: 2023
Subjects:
Description
Summary:<p>Generative modelling has become a popular application of artificial intelligence. Model performance can, however, be impacted negatively when the generative model is misspecified, or when the generative model estimator is modified to adhere to a privacy notion such as differential privacy. In this thesis, we approach generative modelling under model misspecification and differential privacy by presenting four different works.</p> <p>We first present related work on generative modelling. Subsequently, we delve into the reasons that necessitate an examination of generative modelling under the challenges of model misspecification and differential privacy.</p> <p>As an initial contribution, we consider generative modelling for density estimation. One way to approach model misspecification is to relax model assumptions. We show that this can also help in nonparametric models. In particular, we study a recently proposed nonparametric quasi-Bayesian density estimator and identify its strong model assumptions as a reason for poor performance in finite data sets. We propose an autoregressive extension relaxing model assumptions to allow for a-priori feature dependencies.</p> <p>Next, we consider generative modelling for missingness imputation. After categorising current deep generative imputation approaches into the classes of nonignorable missingness models as introduced by Rubin [1976], we extend the formulation of variational autoencoders to factorise according to a nonignorable missingness model class that has not been studied in the deep generative modelling literature before. These explicitly model the missingness mechanisms to prevent model misspecification when missingness is not at random. </p> <p>Then, we focus the attention of this thesis on improving synthetic data generation under differential privacy. For this purpose, we propose differentially private importance sampling of differentially private synthetic data samples. We observe that importance sampling helps more, the better the generative model is. We next focus on increasing data generation quality by considering differentially private diffusion models. We identify training strategies that significantly improve the performance of DP image generators.</p> <p>We conclude the dissertation with a discussion, including contributions and limitations of the presented work, and propose potential directions for future work.</p>