Transforming kernel-based learners to incorporate domain knowledge from climate science
<p>In the face of persistent modelling and observational challenges in climate science, which hinder our understanding of the climate system, statistical machine learning has emerged as a potential ally in recent years. Modern machine learning methods promise to leverage the vast volumes of da...
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2024
|
_version_ | 1824459323076509696 |
---|---|
author | Bouabid, S |
author_facet | Bouabid, S |
author_sort | Bouabid, S |
collection | OXFORD |
description | <p>In the face of persistent modelling and observational challenges in climate science, which hinder our understanding of the climate system, statistical machine learning has emerged as a potential ally in recent years. Modern machine learning methods promise to leverage the vast volumes of data from climate model simulations, satellite imagery, or in-situ measurements to advance our understanding of the climate system and, thereby, our ability to anticipate the adverse consequences of climate change. However, without concerted efforts to align the use of machine learning with the needs of the climate science community, this promise may lead to disappointment due to wasted resources and unmet expectations.</p>
<br>
<p>In this thesis, we propose a set of guiding principles for the design of machine learning models that help align with the expectations of the climate science community. These guidelines include using models meaningfully-specified for a problem, a preference for mathematically transparent models, emphasising probabilistic modelling, and incorporating domain knowledge. We then choose to focus on kernel-based learners, a particular class of machine learning algorithms based on similarity measures between data points, which fits well with these guidelines, and provide relevant background on their application to regression tasks.</p>
<br>
<p>The core of this thesis contributes three studies demonstrating how kernel-based learners can help address challenges in climate science. In Chapter 3 we take on a methodological angle and contribute a framework and theoretical guarantees for the incorporation in regression of a particular kind of domain knowledge that can arise in climate science: knowledge of the causal structure underlying the data generating process. In Chapter 4, we consider an applied modelling challenge in climate science: the development of cheap surrogates of computationally expensive climate models, called climate model emulators. We demonstrate how incorporating Gaussian processes (GPs) modelling in a physically-motivated energy balance model allows us to formulate a simple probabilistic emulator of surface temperatures. This emulator, we call FaIRGP, can learn from data and outperform purely process-based emulators, while retaining the robustness from the incorporated energy balance model. In Chapter 5, we consider an applied observational challenge in climate science: obtaining global estimates of aerosol vertical profiles. We propose a Bayesian model, based on GPs and heuristics from satellite aerosol retrieval algorithms, that infers aerosol vertical extinction profiles from aerosol optical depth measurements and vertically-resolved meteorological data.</p>
<br>
<p>Finally, we conclude in Chapter 6 by reflecting on the work presented, discussing limitations, and directions for future work.</p> |
first_indexed | 2025-02-19T04:39:57Z |
format | Thesis |
id | oxford-uuid:ae4060bd-2e2a-48c4-9c4b-fa1ae2f504a4 |
institution | University of Oxford |
language | English |
last_indexed | 2025-02-19T04:39:57Z |
publishDate | 2024 |
record_format | dspace |
spelling | oxford-uuid:ae4060bd-2e2a-48c4-9c4b-fa1ae2f504a42025-02-17T13:45:19ZTransforming kernel-based learners to incorporate domain knowledge from climate science Thesishttp://purl.org/coar/resource_type/c_db06uuid:ae4060bd-2e2a-48c4-9c4b-fa1ae2f504a4EnglishHyrax Deposit2024Bouabid, S<p>In the face of persistent modelling and observational challenges in climate science, which hinder our understanding of the climate system, statistical machine learning has emerged as a potential ally in recent years. Modern machine learning methods promise to leverage the vast volumes of data from climate model simulations, satellite imagery, or in-situ measurements to advance our understanding of the climate system and, thereby, our ability to anticipate the adverse consequences of climate change. However, without concerted efforts to align the use of machine learning with the needs of the climate science community, this promise may lead to disappointment due to wasted resources and unmet expectations.</p> <br> <p>In this thesis, we propose a set of guiding principles for the design of machine learning models that help align with the expectations of the climate science community. These guidelines include using models meaningfully-specified for a problem, a preference for mathematically transparent models, emphasising probabilistic modelling, and incorporating domain knowledge. We then choose to focus on kernel-based learners, a particular class of machine learning algorithms based on similarity measures between data points, which fits well with these guidelines, and provide relevant background on their application to regression tasks.</p> <br> <p>The core of this thesis contributes three studies demonstrating how kernel-based learners can help address challenges in climate science. In Chapter 3 we take on a methodological angle and contribute a framework and theoretical guarantees for the incorporation in regression of a particular kind of domain knowledge that can arise in climate science: knowledge of the causal structure underlying the data generating process. In Chapter 4, we consider an applied modelling challenge in climate science: the development of cheap surrogates of computationally expensive climate models, called climate model emulators. We demonstrate how incorporating Gaussian processes (GPs) modelling in a physically-motivated energy balance model allows us to formulate a simple probabilistic emulator of surface temperatures. This emulator, we call FaIRGP, can learn from data and outperform purely process-based emulators, while retaining the robustness from the incorporated energy balance model. In Chapter 5, we consider an applied observational challenge in climate science: obtaining global estimates of aerosol vertical profiles. We propose a Bayesian model, based on GPs and heuristics from satellite aerosol retrieval algorithms, that infers aerosol vertical extinction profiles from aerosol optical depth measurements and vertically-resolved meteorological data.</p> <br> <p>Finally, we conclude in Chapter 6 by reflecting on the work presented, discussing limitations, and directions for future work.</p> |
spellingShingle | Bouabid, S Transforming kernel-based learners to incorporate domain knowledge from climate science |
title | Transforming kernel-based learners to incorporate domain knowledge from climate science
|
title_full | Transforming kernel-based learners to incorporate domain knowledge from climate science
|
title_fullStr | Transforming kernel-based learners to incorporate domain knowledge from climate science
|
title_full_unstemmed | Transforming kernel-based learners to incorporate domain knowledge from climate science
|
title_short | Transforming kernel-based learners to incorporate domain knowledge from climate science
|
title_sort | transforming kernel based learners to incorporate domain knowledge from climate science |
work_keys_str_mv | AT bouabids transformingkernelbasedlearnerstoincorporatedomainknowledgefromclimatescience |