Testing and learning on distributional and set inputs

<p>As machine learning gains significant attention in many disciplines and research communities, the variety of data structures has increased, with examples including distributions and sets of observations. In this thesis, we consider sets and distributions as inputs for machine learning pr...

Cur síos iomlán

Sonraí bibleagrafaíochta
Príomhchruthaitheoir: Law, H
Rannpháirtithe: Sejdinovic, D
Formáid: Tráchtas
Teanga:English
Foilsithe / Cruthaithe: 2019
Ábhair:
_version_ 1826291502680113152
author Law, H
author2 Sejdinovic, D
author_facet Sejdinovic, D
Law, H
author_sort Law, H
collection OXFORD
description <p>As machine learning gains significant attention in many disciplines and research communities, the variety of data structures has increased, with examples including distributions and sets of observations. In this thesis, we consider sets and distributions as inputs for machine learning problems. In particular, we propose non-parametric tests, supervised learning, semi-supervised learning and metalearning methodologies on these objects. In each case, with careful consideration of the input structure, we construct models that are applicable to various real life tasks.</p> <p>We begin by considering the problem of <em>weakly supervised learning on aggregate outputs</em>, where the labels are only available at a much coarser resolution than the level of inputs, such that a set of inputs corresponds to each output. Constructing a tractable and scalable framework of aggregated observation models using Gaussian processes, we apply it to the important problem of fine-scale spatial modelling of malaria incidences. In particular, it is demonstrated that the prediction of unobserved pixel-level malaria intensities is possible using finescale environmental covariates.</p> <p>Utilising the same data structure, but with the interpretation that the set of samples is drawn from a distribution, we consider the problem of modelling distributions in the context of hyperparameter selection for supervised learning tasks. Through transfer of information from previously solved tasks using learnt representations of the training datasets, we construct a Gaussian process framework that jointly models all the meta-information available. In application to a range of regression and classification tasks, we demonstrate that we achieve faster convergence compared to the state-of-the-art baselines.</p>
first_indexed 2024-03-07T03:00:21Z
format Thesis
id oxford-uuid:b0c17cd9-a0f0-4c10-a5f5-b59e2c924e9e
institution University of Oxford
language English
last_indexed 2024-03-07T03:00:21Z
publishDate 2019
record_format dspace
spelling oxford-uuid:b0c17cd9-a0f0-4c10-a5f5-b59e2c924e9e2022-03-27T03:58:43ZTesting and learning on distributional and set inputsThesishttp://purl.org/coar/resource_type/c_db06uuid:b0c17cd9-a0f0-4c10-a5f5-b59e2c924e9eStatisticsMachine learningEnglishORA Deposit2019Law, HSejdinovic, D<p>As machine learning gains significant attention in many disciplines and research communities, the variety of data structures has increased, with examples including distributions and sets of observations. In this thesis, we consider sets and distributions as inputs for machine learning problems. In particular, we propose non-parametric tests, supervised learning, semi-supervised learning and metalearning methodologies on these objects. In each case, with careful consideration of the input structure, we construct models that are applicable to various real life tasks.</p> <p>We begin by considering the problem of <em>weakly supervised learning on aggregate outputs</em>, where the labels are only available at a much coarser resolution than the level of inputs, such that a set of inputs corresponds to each output. Constructing a tractable and scalable framework of aggregated observation models using Gaussian processes, we apply it to the important problem of fine-scale spatial modelling of malaria incidences. In particular, it is demonstrated that the prediction of unobserved pixel-level malaria intensities is possible using finescale environmental covariates.</p> <p>Utilising the same data structure, but with the interpretation that the set of samples is drawn from a distribution, we consider the problem of modelling distributions in the context of hyperparameter selection for supervised learning tasks. Through transfer of information from previously solved tasks using learnt representations of the training datasets, we construct a Gaussian process framework that jointly models all the meta-information available. In application to a range of regression and classification tasks, we demonstrate that we achieve faster convergence compared to the state-of-the-art baselines.</p>
spellingShingle Statistics
Machine learning
Law, H
Testing and learning on distributional and set inputs
title Testing and learning on distributional and set inputs
title_full Testing and learning on distributional and set inputs
title_fullStr Testing and learning on distributional and set inputs
title_full_unstemmed Testing and learning on distributional and set inputs
title_short Testing and learning on distributional and set inputs
title_sort testing and learning on distributional and set inputs
topic Statistics
Machine learning
work_keys_str_mv AT lawh testingandlearningondistributionalandsetinputs