Actor-critic objective penalty function method: an adaptive strategy for trajectory tracking in autonomous driving

Abstract Trajectory tracking is a key technology for controlling the autonomous vehicles effectively and stably to track the reference trajectory. How to handle the various constraints in trajectory tracking is very challenging. The recently proposed generalized exterior point method (GEP) shows hig...

Full description

Bibliographic Details
Main Authors: Bo Wang, Fusheng Bai, Ke Zhang
Format: Article
Language:English
Published: Springer 2023-09-01
Series:Complex & Intelligent Systems
Subjects:
Online Access:https://doi.org/10.1007/s40747-023-01238-6
_version_ 1797233159664828416
author Bo Wang
Fusheng Bai
Ke Zhang
author_facet Bo Wang
Fusheng Bai
Ke Zhang
author_sort Bo Wang
collection DOAJ
description Abstract Trajectory tracking is a key technology for controlling the autonomous vehicles effectively and stably to track the reference trajectory. How to handle the various constraints in trajectory tracking is very challenging. The recently proposed generalized exterior point method (GEP) shows high computational efficiency and closed-loop performance in solving the constrained trajectory tracking problem. However, the neural networks used in the GEP may suffer from the ill-conditioning issue during model training, which result in a slow or even non-converging training convergence process and the control output of the policy network being suboptimal or even severely constraint-violating. To effectively deal with the large-scale nonlinear state-wise constraints and avoid the ill-conditioning issue, we propose a model-based reinforcement learning (RL) method called the actor-critic objective penalty function method (ACOPFM) for trajectory tracking in autonomous driving. We adopt an integrated decision and control (IDC)-based planning and control scheme to transform the trajectory tracking problem into MPC-based nonlinear programming problems and embed the objective penalty function method into an actor-critic solution framework. The nonlinear programming problem is transformed into an unconstrained optimization problem and employed as a loss function for model updating of the policy network, and the ill-conditioning issue is avoided by alternately performing gradient descent and adaptively adjusting the penalty parameter. The convergence of ACOPFM is proved. The simulation results demonstrate that the ACOPFM converges to the optimal control strategy fast and steadily, and perform well under the multi-lane test scenario.
first_indexed 2024-04-24T16:11:44Z
format Article
id doaj.art-132c9dc1b2644c96972597d39ddb47c4
institution Directory Open Access Journal
issn 2199-4536
2198-6053
language English
last_indexed 2024-04-24T16:11:44Z
publishDate 2023-09-01
publisher Springer
record_format Article
series Complex & Intelligent Systems
spelling doaj.art-132c9dc1b2644c96972597d39ddb47c42024-03-31T11:39:40ZengSpringerComplex & Intelligent Systems2199-45362198-60532023-09-011021715173210.1007/s40747-023-01238-6Actor-critic objective penalty function method: an adaptive strategy for trajectory tracking in autonomous drivingBo Wang0Fusheng Bai1Ke Zhang2National Center for Applied Mathematics in Chongqing, Chongqing Normal UniversityNational Center for Applied Mathematics in Chongqing, Chongqing Normal UniversityNational Center for Applied Mathematics in Chongqing, Chongqing Normal UniversityAbstract Trajectory tracking is a key technology for controlling the autonomous vehicles effectively and stably to track the reference trajectory. How to handle the various constraints in trajectory tracking is very challenging. The recently proposed generalized exterior point method (GEP) shows high computational efficiency and closed-loop performance in solving the constrained trajectory tracking problem. However, the neural networks used in the GEP may suffer from the ill-conditioning issue during model training, which result in a slow or even non-converging training convergence process and the control output of the policy network being suboptimal or even severely constraint-violating. To effectively deal with the large-scale nonlinear state-wise constraints and avoid the ill-conditioning issue, we propose a model-based reinforcement learning (RL) method called the actor-critic objective penalty function method (ACOPFM) for trajectory tracking in autonomous driving. We adopt an integrated decision and control (IDC)-based planning and control scheme to transform the trajectory tracking problem into MPC-based nonlinear programming problems and embed the objective penalty function method into an actor-critic solution framework. The nonlinear programming problem is transformed into an unconstrained optimization problem and employed as a loss function for model updating of the policy network, and the ill-conditioning issue is avoided by alternately performing gradient descent and adaptively adjusting the penalty parameter. The convergence of ACOPFM is proved. The simulation results demonstrate that the ACOPFM converges to the optimal control strategy fast and steadily, and perform well under the multi-lane test scenario.https://doi.org/10.1007/s40747-023-01238-6Autonomous drivingTrajectory trackingModel predictive control (MPC)Reinforcement learning (RL)Objective penalty function method
spellingShingle Bo Wang
Fusheng Bai
Ke Zhang
Actor-critic objective penalty function method: an adaptive strategy for trajectory tracking in autonomous driving
Complex & Intelligent Systems
Autonomous driving
Trajectory tracking
Model predictive control (MPC)
Reinforcement learning (RL)
Objective penalty function method
title Actor-critic objective penalty function method: an adaptive strategy for trajectory tracking in autonomous driving
title_full Actor-critic objective penalty function method: an adaptive strategy for trajectory tracking in autonomous driving
title_fullStr Actor-critic objective penalty function method: an adaptive strategy for trajectory tracking in autonomous driving
title_full_unstemmed Actor-critic objective penalty function method: an adaptive strategy for trajectory tracking in autonomous driving
title_short Actor-critic objective penalty function method: an adaptive strategy for trajectory tracking in autonomous driving
title_sort actor critic objective penalty function method an adaptive strategy for trajectory tracking in autonomous driving
topic Autonomous driving
Trajectory tracking
Model predictive control (MPC)
Reinforcement learning (RL)
Objective penalty function method
url https://doi.org/10.1007/s40747-023-01238-6
work_keys_str_mv AT bowang actorcriticobjectivepenaltyfunctionmethodanadaptivestrategyfortrajectorytrackinginautonomousdriving
AT fushengbai actorcriticobjectivepenaltyfunctionmethodanadaptivestrategyfortrajectorytrackinginautonomousdriving
AT kezhang actorcriticobjectivepenaltyfunctionmethodanadaptivestrategyfortrajectorytrackinginautonomousdriving