Safe Optimal Control of Dynamic Systems: Learning from Experts and Safely Exploring New Policies
Many real-life systems are usually controlled through policies replicating experts’ knowledge, typically favouring “safety” at the expense of optimality. Indeed, these control policies are usually aimed at avoiding a system’s disruptions or deviations from a target behaviour, leading to suboptimal p...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-10-01
|
Series: | Mathematics |
Subjects: | |
Online Access: | https://www.mdpi.com/2227-7390/11/20/4347 |
_version_ | 1797573081321963520 |
---|---|
author | Antonio Candelieri Andrea Ponti Elisabetta Fersini Enza Messina Francesco Archetti |
author_facet | Antonio Candelieri Andrea Ponti Elisabetta Fersini Enza Messina Francesco Archetti |
author_sort | Antonio Candelieri |
collection | DOAJ |
description | Many real-life systems are usually controlled through policies replicating experts’ knowledge, typically favouring “safety” at the expense of optimality. Indeed, these control policies are usually aimed at avoiding a system’s disruptions or deviations from a target behaviour, leading to suboptimal performances. This paper proposes a statistical learning approach to exploit the historical safe experience—collected through the application of a safe control policy based on experts’ knowledge— to “safely explore” new and more efficient policies. The basic idea is that performances can be improved by facing a reasonable and quantifiable risk in terms of safety. The proposed approach relies on Gaussian Process regression to obtain a probabilistic model of both a system’s dynamics and performances, depending on the historical safe experience. The new policy consists of solving a constrained optimization problem, with two Gaussian Processes modelling, respectively, the safety constraints and the performance metric (i.e., objective function). As a probabilistic model, Gaussian Process regression provides an estimate of the target variable and the associated uncertainty; this property is crucial for dealing with uncertainty while new policies are safely explored. Another important benefit is that the proposed approach does not require any implementation of an expensive digital twin of the original system. Results on two real-life systems are presented, empirically proving the ability of the approach to improve performances with respect to the initial safe policy without significantly affecting safety. |
first_indexed | 2024-03-10T21:04:36Z |
format | Article |
id | doaj.art-4be0a83a5c5f40b2b6d64e567a63a159 |
institution | Directory Open Access Journal |
issn | 2227-7390 |
language | English |
last_indexed | 2024-03-10T21:04:36Z |
publishDate | 2023-10-01 |
publisher | MDPI AG |
record_format | Article |
series | Mathematics |
spelling | doaj.art-4be0a83a5c5f40b2b6d64e567a63a1592023-11-19T17:14:44ZengMDPI AGMathematics2227-73902023-10-011120434710.3390/math11204347Safe Optimal Control of Dynamic Systems: Learning from Experts and Safely Exploring New PoliciesAntonio Candelieri0Andrea Ponti1Elisabetta Fersini2Enza Messina3Francesco Archetti4Department of Economics Management and Statistics, University of Milano-Bicocca, 20126 Milan, ItalyDepartment of Economics Management and Statistics, University of Milano-Bicocca, 20126 Milan, ItalyDepartment of Computer Science Systems and Communication, University of Milano-Bicocca, 20126 Milan, ItalyDepartment of Computer Science Systems and Communication, University of Milano-Bicocca, 20126 Milan, ItalyDepartment of Computer Science Systems and Communication, University of Milano-Bicocca, 20126 Milan, ItalyMany real-life systems are usually controlled through policies replicating experts’ knowledge, typically favouring “safety” at the expense of optimality. Indeed, these control policies are usually aimed at avoiding a system’s disruptions or deviations from a target behaviour, leading to suboptimal performances. This paper proposes a statistical learning approach to exploit the historical safe experience—collected through the application of a safe control policy based on experts’ knowledge— to “safely explore” new and more efficient policies. The basic idea is that performances can be improved by facing a reasonable and quantifiable risk in terms of safety. The proposed approach relies on Gaussian Process regression to obtain a probabilistic model of both a system’s dynamics and performances, depending on the historical safe experience. The new policy consists of solving a constrained optimization problem, with two Gaussian Processes modelling, respectively, the safety constraints and the performance metric (i.e., objective function). As a probabilistic model, Gaussian Process regression provides an estimate of the target variable and the associated uncertainty; this property is crucial for dealing with uncertainty while new policies are safely explored. Another important benefit is that the proposed approach does not require any implementation of an expensive digital twin of the original system. Results on two real-life systems are presented, empirically proving the ability of the approach to improve performances with respect to the initial safe policy without significantly affecting safety.https://www.mdpi.com/2227-7390/11/20/4347optimal controlsafe explorationGaussian Processes |
spellingShingle | Antonio Candelieri Andrea Ponti Elisabetta Fersini Enza Messina Francesco Archetti Safe Optimal Control of Dynamic Systems: Learning from Experts and Safely Exploring New Policies Mathematics optimal control safe exploration Gaussian Processes |
title | Safe Optimal Control of Dynamic Systems: Learning from Experts and Safely Exploring New Policies |
title_full | Safe Optimal Control of Dynamic Systems: Learning from Experts and Safely Exploring New Policies |
title_fullStr | Safe Optimal Control of Dynamic Systems: Learning from Experts and Safely Exploring New Policies |
title_full_unstemmed | Safe Optimal Control of Dynamic Systems: Learning from Experts and Safely Exploring New Policies |
title_short | Safe Optimal Control of Dynamic Systems: Learning from Experts and Safely Exploring New Policies |
title_sort | safe optimal control of dynamic systems learning from experts and safely exploring new policies |
topic | optimal control safe exploration Gaussian Processes |
url | https://www.mdpi.com/2227-7390/11/20/4347 |
work_keys_str_mv | AT antoniocandelieri safeoptimalcontrolofdynamicsystemslearningfromexpertsandsafelyexploringnewpolicies AT andreaponti safeoptimalcontrolofdynamicsystemslearningfromexpertsandsafelyexploringnewpolicies AT elisabettafersini safeoptimalcontrolofdynamicsystemslearningfromexpertsandsafelyexploringnewpolicies AT enzamessina safeoptimalcontrolofdynamicsystemslearningfromexpertsandsafelyexploringnewpolicies AT francescoarchetti safeoptimalcontrolofdynamicsystemslearningfromexpertsandsafelyexploringnewpolicies |