Toward Learning Human-Like, Safe and Comfortable Car-Following Policies With a Novel Deep Reinforcement Learning Approach

In this paper, we present an advanced adaptive cruise control (ACC) concept powered by Deep Reinforcement Learning (DRL) that generates safe, human-like, and comfortable car-following policies. Unlike the current trend in developing DRL-based ACC systems, we propose defining the action space of the...

Full description

Bibliographic Details
Main Authors: M. Ugur Yavas, Tufan Kumbasar, Nazim Kemal Ure
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10045659/
Description
Summary:In this paper, we present an advanced adaptive cruise control (ACC) concept powered by Deep Reinforcement Learning (DRL) that generates safe, human-like, and comfortable car-following policies. Unlike the current trend in developing DRL-based ACC systems, we propose defining the action space of the DRL agent with discrete actions rather than continuous ones, since human drivers never set the throttle/brake pedal level to be actuated, but rather the required change of the current pedal levels. Through this human-like throttle-brake manipulation representation, we also define explicit actions for holding (keeping the last action) and coasting (no action), which are usually omitted as actions in ACC systems. Moreover, based on the investigation of a real-world driving dataset, we cast a novel reward function that is easy to interpret and personalized. The proposed reward enforces the agent to learn stable and safe actions, while also encouraging the holding and coasting actions, just like a human driver would. The proposed discrete action DRL agent is trained with action masking, and the reward terms are completely derived from the real-world dataset collected from a human driver. We present exhaustive comparative results to show the advantages of the proposed DRL approach in both simulation and scenarios extracted from real-world driving. We clearly show that the proposed policy imitates human driving significantly better and handles complex driving situations, such as cut-ins and cut-outs, implicitly, in comparison with a DRL agent trained with a widely-used reward function proposed for ACC, a model predictive control structure, and traditional car-following approaches.
ISSN:2169-3536