Testing and learning on distributional and set inputs

<p>As machine learning gains significant attention in many disciplines and research communities, the variety of data structures has increased, with examples including distributions and sets of observations. In this thesis, we consider sets and distributions as inputs for machine learning pr...

Popoln opis

Bibliografske podrobnosti
Glavni avtor:	Law, H
Drugi avtorji:	Sejdinovic, D
Format:	Thesis
Jezik:	English
Izdano:	2019
Teme:	Statistics Machine learning

Opis
Izvleček:	<p>As machine learning gains significant attention in many disciplines and research communities, the variety of data structures has increased, with examples including distributions and sets of observations. In this thesis, we consider sets and distributions as inputs for machine learning problems. In particular, we propose non-parametric tests, supervised learning, semi-supervised learning and metalearning methodologies on these objects. In each case, with careful consideration of the input structure, we construct models that are applicable to various real life tasks.</p> <p>We begin by considering the problem of <em>weakly supervised learning on aggregate outputs</em>, where the labels are only available at a much coarser resolution than the level of inputs, such that a set of inputs corresponds to each output. Constructing a tractable and scalable framework of aggregated observation models using Gaussian processes, we apply it to the important problem of fine-scale spatial modelling of malaria incidences. In particular, it is demonstrated that the prediction of unobserved pixel-level malaria intensities is possible using finescale environmental covariates.</p> <p>Utilising the same data structure, but with the interpretation that the set of samples is drawn from a distribution, we consider the problem of modelling distributions in the context of hyperparameter selection for supervised learning tasks. Through transfer of information from previously solved tasks using learnt representations of the training datasets, we construct a Gaussian process framework that jointly models all the meta-information available. In application to a range of regression and classification tasks, we demonstrate that we achieve faster convergence compared to the state-of-the-art baselines.</p>

Testing and learning on distributional and set inputs

Podobne knjige/članki