Safe Optimal Control of Dynamic Systems: Learning from Experts and Safely Exploring New Policies

Many real-life systems are usually controlled through policies replicating experts’ knowledge, typically favouring “safety” at the expense of optimality. Indeed, these control policies are usually aimed at avoiding a system’s disruptions or deviations from a target behaviour, leading to suboptimal p...

Full description

Bibliographic Details
Main Authors: Antonio Candelieri, Andrea Ponti, Elisabetta Fersini, Enza Messina, Francesco Archetti
Format: Article
Language:English
Published: MDPI AG 2023-10-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/11/20/4347
_version_ 1797573081321963520
author Antonio Candelieri
Andrea Ponti
Elisabetta Fersini
Enza Messina
Francesco Archetti
author_facet Antonio Candelieri
Andrea Ponti
Elisabetta Fersini
Enza Messina
Francesco Archetti
author_sort Antonio Candelieri
collection DOAJ
description Many real-life systems are usually controlled through policies replicating experts’ knowledge, typically favouring “safety” at the expense of optimality. Indeed, these control policies are usually aimed at avoiding a system’s disruptions or deviations from a target behaviour, leading to suboptimal performances. This paper proposes a statistical learning approach to exploit the historical safe experience—collected through the application of a safe control policy based on experts’ knowledge— to “safely explore” new and more efficient policies. The basic idea is that performances can be improved by facing a reasonable and quantifiable risk in terms of safety. The proposed approach relies on Gaussian Process regression to obtain a probabilistic model of both a system’s dynamics and performances, depending on the historical safe experience. The new policy consists of solving a constrained optimization problem, with two Gaussian Processes modelling, respectively, the safety constraints and the performance metric (i.e., objective function). As a probabilistic model, Gaussian Process regression provides an estimate of the target variable and the associated uncertainty; this property is crucial for dealing with uncertainty while new policies are safely explored. Another important benefit is that the proposed approach does not require any implementation of an expensive digital twin of the original system. Results on two real-life systems are presented, empirically proving the ability of the approach to improve performances with respect to the initial safe policy without significantly affecting safety.
first_indexed 2024-03-10T21:04:36Z
format Article
id doaj.art-4be0a83a5c5f40b2b6d64e567a63a159
institution Directory Open Access Journal
issn 2227-7390
language English
last_indexed 2024-03-10T21:04:36Z
publishDate 2023-10-01
publisher MDPI AG
record_format Article
series Mathematics
spelling doaj.art-4be0a83a5c5f40b2b6d64e567a63a1592023-11-19T17:14:44ZengMDPI AGMathematics2227-73902023-10-011120434710.3390/math11204347Safe Optimal Control of Dynamic Systems: Learning from Experts and Safely Exploring New PoliciesAntonio Candelieri0Andrea Ponti1Elisabetta Fersini2Enza Messina3Francesco Archetti4Department of Economics Management and Statistics, University of Milano-Bicocca, 20126 Milan, ItalyDepartment of Economics Management and Statistics, University of Milano-Bicocca, 20126 Milan, ItalyDepartment of Computer Science Systems and Communication, University of Milano-Bicocca, 20126 Milan, ItalyDepartment of Computer Science Systems and Communication, University of Milano-Bicocca, 20126 Milan, ItalyDepartment of Computer Science Systems and Communication, University of Milano-Bicocca, 20126 Milan, ItalyMany real-life systems are usually controlled through policies replicating experts’ knowledge, typically favouring “safety” at the expense of optimality. Indeed, these control policies are usually aimed at avoiding a system’s disruptions or deviations from a target behaviour, leading to suboptimal performances. This paper proposes a statistical learning approach to exploit the historical safe experience—collected through the application of a safe control policy based on experts’ knowledge— to “safely explore” new and more efficient policies. The basic idea is that performances can be improved by facing a reasonable and quantifiable risk in terms of safety. The proposed approach relies on Gaussian Process regression to obtain a probabilistic model of both a system’s dynamics and performances, depending on the historical safe experience. The new policy consists of solving a constrained optimization problem, with two Gaussian Processes modelling, respectively, the safety constraints and the performance metric (i.e., objective function). As a probabilistic model, Gaussian Process regression provides an estimate of the target variable and the associated uncertainty; this property is crucial for dealing with uncertainty while new policies are safely explored. Another important benefit is that the proposed approach does not require any implementation of an expensive digital twin of the original system. Results on two real-life systems are presented, empirically proving the ability of the approach to improve performances with respect to the initial safe policy without significantly affecting safety.https://www.mdpi.com/2227-7390/11/20/4347optimal controlsafe explorationGaussian Processes
spellingShingle Antonio Candelieri
Andrea Ponti
Elisabetta Fersini
Enza Messina
Francesco Archetti
Safe Optimal Control of Dynamic Systems: Learning from Experts and Safely Exploring New Policies
Mathematics
optimal control
safe exploration
Gaussian Processes
title Safe Optimal Control of Dynamic Systems: Learning from Experts and Safely Exploring New Policies
title_full Safe Optimal Control of Dynamic Systems: Learning from Experts and Safely Exploring New Policies
title_fullStr Safe Optimal Control of Dynamic Systems: Learning from Experts and Safely Exploring New Policies
title_full_unstemmed Safe Optimal Control of Dynamic Systems: Learning from Experts and Safely Exploring New Policies
title_short Safe Optimal Control of Dynamic Systems: Learning from Experts and Safely Exploring New Policies
title_sort safe optimal control of dynamic systems learning from experts and safely exploring new policies
topic optimal control
safe exploration
Gaussian Processes
url https://www.mdpi.com/2227-7390/11/20/4347
work_keys_str_mv AT antoniocandelieri safeoptimalcontrolofdynamicsystemslearningfromexpertsandsafelyexploringnewpolicies
AT andreaponti safeoptimalcontrolofdynamicsystemslearningfromexpertsandsafelyexploringnewpolicies
AT elisabettafersini safeoptimalcontrolofdynamicsystemslearningfromexpertsandsafelyexploringnewpolicies
AT enzamessina safeoptimalcontrolofdynamicsystemslearningfromexpertsandsafelyexploringnewpolicies
AT francescoarchetti safeoptimalcontrolofdynamicsystemslearningfromexpertsandsafelyexploringnewpolicies