Safe reinforcement learning in automotive
This work applies Logically Constrained Reinforcement Learning (LCRL) frame-work for synthesizing policies for active monitoring and management of sensor modules in autonomous vehicles. LCRL allows synthesizing policies for unknown and continuous-state Markov Decision Processes (MDPs) such that a gi...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Language: | English |
Published: |
2020
|
Subjects: |
_version_ | 1824458697121726464 |
---|---|
author | Shah, A |
author2 | Abate, A |
author_facet | Abate, A Shah, A |
author_sort | Shah, A |
collection | OXFORD |
description | This work applies Logically Constrained Reinforcement Learning (LCRL) frame-work for synthesizing policies for active monitoring and management of sensor modules in autonomous vehicles. LCRL allows synthesizing policies for unknown and continuous-state Markov Decision Processes (MDPs) such that a given linear time property is satisfied. We frame the problem of dynamic sensor module selection in automotive as an MDP and define the constraints as a Linear Temporal Logic (LTL) property. The LTL properties are used to guard and guide the MDP in finding an optimal policy. Defining a reward function over the state-action pairs of MDP based on the LTL property results in learning in a constrained environment. This approach leads to an improvement in performance and scalability for finding the right monitoring policies, as the safety properties enforces bounds on the search space. |
first_indexed | 2025-02-19T04:30:00Z |
format | Thesis |
id | oxford-uuid:726e5248-e784-420d-b712-4e1ce4de55af |
institution | University of Oxford |
language | English |
last_indexed | 2025-02-19T04:30:00Z |
publishDate | 2020 |
record_format | dspace |
spelling | oxford-uuid:726e5248-e784-420d-b712-4e1ce4de55af2025-01-02T13:03:27ZSafe reinforcement learning in automotiveThesishttp://purl.org/coar/resource_type/c_bdccuuid:726e5248-e784-420d-b712-4e1ce4de55afArtificial intelligenceLogic, Symbolic and mathematicalReinforcement LearningDeep learning (Machine learning)Automated vehiclesEnglishHyrax Deposit2020Shah, AAbate, AHasanbeig, HPathak, SThis work applies Logically Constrained Reinforcement Learning (LCRL) frame-work for synthesizing policies for active monitoring and management of sensor modules in autonomous vehicles. LCRL allows synthesizing policies for unknown and continuous-state Markov Decision Processes (MDPs) such that a given linear time property is satisfied. We frame the problem of dynamic sensor module selection in automotive as an MDP and define the constraints as a Linear Temporal Logic (LTL) property. The LTL properties are used to guard and guide the MDP in finding an optimal policy. Defining a reward function over the state-action pairs of MDP based on the LTL property results in learning in a constrained environment. This approach leads to an improvement in performance and scalability for finding the right monitoring policies, as the safety properties enforces bounds on the search space. |
spellingShingle | Artificial intelligence Logic, Symbolic and mathematical Reinforcement Learning Deep learning (Machine learning) Automated vehicles Shah, A Safe reinforcement learning in automotive |
title | Safe reinforcement learning in automotive |
title_full | Safe reinforcement learning in automotive |
title_fullStr | Safe reinforcement learning in automotive |
title_full_unstemmed | Safe reinforcement learning in automotive |
title_short | Safe reinforcement learning in automotive |
title_sort | safe reinforcement learning in automotive |
topic | Artificial intelligence Logic, Symbolic and mathematical Reinforcement Learning Deep learning (Machine learning) Automated vehicles |
work_keys_str_mv | AT shaha safereinforcementlearninginautomotive |