Development of a deep Q-learning energy management system for a hybrid electric vehicle

In recent years, Machine Learning (ML) techniques have gained increasing popularity in several fields thanks to their ability to find hidden and complex relationships between data. Their capabilities for solving complex optimization tasks have made them extremely attractive also for the design of th...

Full description

Bibliographic Details
Main Authors: Luigi Tresca, Luca Pulvirenti, Luciano Rolando, Federico Millo
Format: Article
Language:English
Published: Elsevier 2024-06-01
Series:Transportation Engineering
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2666691X24000162
Description
Summary:In recent years, Machine Learning (ML) techniques have gained increasing popularity in several fields thanks to their ability to find hidden and complex relationships between data. Their capabilities for solving complex optimization tasks have made them extremely attractive also for the design of the Energy Management System (EMS) of electrified vehicles. Among the plethora of existing techniques, Reinforcement Learning (RL) algorithms have unprecedented potential since they can self-learn by directly interacting with the external environment through a trial-and-error procedure. In this paper, a Deep Q-Learning (DQL) agent, which exploits Deep Neural Networks (DNNs) to map the state-action pair to its value, was trained to reduce the CO2 emissions of a state-of-the-art diesel Plug-in Hybrid Electric Vehicle (PHEV) available on the European market. The proposed methodology was tested on a virtual test rig of the investigated vehicle while operating on a charge-sustaining logic. A sensitivity analysis was performed on the reward to test the capabilities of different penalty functions to improve the fuel economy while guaranteeing the battery charge sustainability. The potential of the proposed control strategy was firstly assessed on the Worldwide harmonized Light-duty vehicles Test Cycle (WLTC) and benchmarked against a Dynamic Programming (DP) optimization to evaluate each reward. Then the best agent was tested on a wide range of type-approval and Read Driving Emission (RDE) scenarios. The results show that the best-performing agent can reach performance close to the DP reference, with a limited gap (7 %) in terms of CO2 emissions.
ISSN:2666-691X