Probabilistic reach-avoid for Bayesian neural networks

Model-based reinforcement learning seeks to simultaneously learn the dynamics of an unknown stochastic environment and synthesise an optimal policy for acting in it. Ensuring the safety and robustness of sequential decisions made through a policy in such an environment is a key challenge for policie...

Full description

Bibliographic Details
Main Authors:	Wicker, M, Laurenti, L, Patane, A, Paoletti, N, Abate, A, Kwiatkowska, M
Format:	Journal article
Language:	English
Published:	Elsevier 2024

_version_	1811139514526072832
author	Wicker, M Laurenti, L Patane, A Paoletti, N Abate, A Kwiatkowska, M
author_facet	Wicker, M Laurenti, L Patane, A Paoletti, N Abate, A Kwiatkowska, M
author_sort	Wicker, M
collection	OXFORD
description	Model-based reinforcement learning seeks to simultaneously learn the dynamics of an unknown stochastic environment and synthesise an optimal policy for acting in it. Ensuring the safety and robustness of sequential decisions made through a policy in such an environment is a key challenge for policies intended for safety-critical scenarios. In this work, we investigate two complementary problems: first, computing reach-avoid probabilities for iterative predictions made with dynamical models, with dynamics described by Bayesian neural network (BNN); second, synthesising control policies that are optimal with respect to a given reach-avoid specification (reaching a “target” state, while avoiding a set of “unsafe” states) and a learned BNN model. Our solution leverages interval propagation and backward recursion techniques to compute lower bounds for the probability that a policy's sequence of actions leads to satisfying the reach-avoid specification. Such computed lower bounds provide safety certification for the given policy and BNN model. We then introduce control synthesis algorithms to derive policies maximizing said lower bounds on the safety probability. We demonstrate the effectiveness of our method on a series of control benchmarks characterized by learned BNN dynamics models. On our most challenging benchmark, compared to purely data-driven policies the optimal synthesis algorithm is able to provide more than a four-fold increase in the number of certifiable states and more than a three-fold increase in the average guaranteed reach-avoid probability.
first_indexed	2024-04-09T03:58:06Z
format	Journal article
id	oxford-uuid:8e77de5d-a9f7-4ab3-a9af-709038019ed2
institution	University of Oxford
language	English
last_indexed	2024-09-25T04:07:18Z
publishDate	2024
publisher	Elsevier
record_format	dspace
spelling	oxford-uuid:8e77de5d-a9f7-4ab3-a9af-709038019ed22024-06-05T07:28:18ZProbabilistic reach-avoid for Bayesian neural networksJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:8e77de5d-a9f7-4ab3-a9af-709038019ed2EnglishSymplectic ElementsElsevier2024Wicker, MLaurenti, LPatane, APaoletti, NAbate, AKwiatkowska, MModel-based reinforcement learning seeks to simultaneously learn the dynamics of an unknown stochastic environment and synthesise an optimal policy for acting in it. Ensuring the safety and robustness of sequential decisions made through a policy in such an environment is a key challenge for policies intended for safety-critical scenarios. In this work, we investigate two complementary problems: first, computing reach-avoid probabilities for iterative predictions made with dynamical models, with dynamics described by Bayesian neural network (BNN); second, synthesising control policies that are optimal with respect to a given reach-avoid specification (reaching a “target” state, while avoiding a set of “unsafe” states) and a learned BNN model. Our solution leverages interval propagation and backward recursion techniques to compute lower bounds for the probability that a policy's sequence of actions leads to satisfying the reach-avoid specification. Such computed lower bounds provide safety certification for the given policy and BNN model. We then introduce control synthesis algorithms to derive policies maximizing said lower bounds on the safety probability. We demonstrate the effectiveness of our method on a series of control benchmarks characterized by learned BNN dynamics models. On our most challenging benchmark, compared to purely data-driven policies the optimal synthesis algorithm is able to provide more than a four-fold increase in the number of certifiable states and more than a three-fold increase in the average guaranteed reach-avoid probability.
spellingShingle	Wicker, M Laurenti, L Patane, A Paoletti, N Abate, A Kwiatkowska, M Probabilistic reach-avoid for Bayesian neural networks
title	Probabilistic reach-avoid for Bayesian neural networks
title_full	Probabilistic reach-avoid for Bayesian neural networks
title_fullStr	Probabilistic reach-avoid for Bayesian neural networks
title_full_unstemmed	Probabilistic reach-avoid for Bayesian neural networks
title_short	Probabilistic reach-avoid for Bayesian neural networks
title_sort	probabilistic reach avoid for bayesian neural networks
work_keys_str_mv	AT wickerm probabilisticreachavoidforbayesianneuralnetworks AT laurentil probabilisticreachavoidforbayesianneuralnetworks AT patanea probabilisticreachavoidforbayesianneuralnetworks AT paolettin probabilisticreachavoidforbayesianneuralnetworks AT abatea probabilisticreachavoidforbayesianneuralnetworks AT kwiatkowskam probabilisticreachavoidforbayesianneuralnetworks

Probabilistic reach-avoid for Bayesian neural networks

Similar Items