Distribution regression for sequential data

Distribution regression refers to the supervised learning problem where labels are only available for groups of inputs instead of individual inputs. In this paper, we develop a rigorous mathematical framework for distribution regression where inputs are complex data streams. Leveraging properties of...

Ամբողջական նկարագրություն

Մատենագիտական մանրամասներ
Հիմնական հեղինակներ: Lemercier, M, Salvi, C, Damoulas, T, Bonilla, EV, Lyons, T
Ձևաչափ: Internet publication
Լեզու:English
Հրապարակվել է: 2020
_version_ 1826312720285171712
author Lemercier, M
Salvi, C
Damoulas, T
Bonilla, EV
Lyons, T
author_facet Lemercier, M
Salvi, C
Damoulas, T
Bonilla, EV
Lyons, T
author_sort Lemercier, M
collection OXFORD
description Distribution regression refers to the supervised learning problem where labels are only available for groups of inputs instead of individual inputs. In this paper, we develop a rigorous mathematical framework for distribution regression where inputs are complex data streams. Leveraging properties of the expected signature and a recent signature kernel trick for sequential data from stochastic analysis, we introduce two new learning techniques, one feature-based and the other kernel-based. Each is suited to a different data regime in terms of the number of data streams and the dimensionality of the individual streams. We provide theoretical results on the universality of both approaches and demonstrate empirically their robustness to irregularly sampled multivariate time-series, achieving state-of-the-art performance on both synthetic and real-world examples from thermodynamics, mathematical finance and agricultural science.
first_indexed 2024-04-09T03:58:47Z
format Internet publication
id oxford-uuid:ff19fd39-905e-4ae1-a467-2a4098f8c715
institution University of Oxford
language English
last_indexed 2024-04-09T03:58:47Z
publishDate 2020
record_format dspace
spelling oxford-uuid:ff19fd39-905e-4ae1-a467-2a4098f8c7152024-03-27T15:30:57ZDistribution regression for sequential dataInternet publicationhttp://purl.org/coar/resource_type/c_7ad9uuid:ff19fd39-905e-4ae1-a467-2a4098f8c715EnglishSymplectic Elements2020Lemercier, MSalvi, CDamoulas, TBonilla, EVLyons, TDistribution regression refers to the supervised learning problem where labels are only available for groups of inputs instead of individual inputs. In this paper, we develop a rigorous mathematical framework for distribution regression where inputs are complex data streams. Leveraging properties of the expected signature and a recent signature kernel trick for sequential data from stochastic analysis, we introduce two new learning techniques, one feature-based and the other kernel-based. Each is suited to a different data regime in terms of the number of data streams and the dimensionality of the individual streams. We provide theoretical results on the universality of both approaches and demonstrate empirically their robustness to irregularly sampled multivariate time-series, achieving state-of-the-art performance on both synthetic and real-world examples from thermodynamics, mathematical finance and agricultural science.
spellingShingle Lemercier, M
Salvi, C
Damoulas, T
Bonilla, EV
Lyons, T
Distribution regression for sequential data
title Distribution regression for sequential data
title_full Distribution regression for sequential data
title_fullStr Distribution regression for sequential data
title_full_unstemmed Distribution regression for sequential data
title_short Distribution regression for sequential data
title_sort distribution regression for sequential data
work_keys_str_mv AT lemercierm distributionregressionforsequentialdata
AT salvic distributionregressionforsequentialdata
AT damoulast distributionregressionforsequentialdata
AT bonillaev distributionregressionforsequentialdata
AT lyonst distributionregressionforsequentialdata