Optimal control and reinforcement learning for formula one lap simulation

Lap simulation in a Formula One context is a subclass of optimal control problems and describes the computation of optimal trajectories around racing circuits. The results of lap simulation are primarily used for vehicle setup and strategic racing decisions. The optimal lap problem is solved using t...

Full description

Bibliographic Details
Main Author: Hoeppke, C
Other Authors: Nakatsukasa, Y
Format: Thesis
Language:English
Published: 2022
Subjects:
Description
Summary:Lap simulation in a Formula One context is a subclass of optimal control problems and describes the computation of optimal trajectories around racing circuits. The results of lap simulation are primarily used for vehicle setup and strategic racing decisions. The optimal lap problem is solved using two classes of algorithms. The first algorithm uses direct collocation to compute optimal trajectories and the second algorithm uses specially constructed reinforcement learning environments and generalised function approximation to compute desirable system inputs. Historically direct collocation methods were considered impractical for minimum lap time simulations, due to their high computational costs. The exponential increase in computational performance has enabled the practical application of these algorithms. These lap time simulations require a vehicle model, as well as a track discretisation. As an example for this, the classical bicycle model along with a curvilinear track model are introduced. To solve the resulting direct collocation problems, algorithms for non-linear optimisation problems are presented and performance critical aspects are discussed. The optimisation algorithm is accelerated by utilising highly parallel computer architectures, such as graphics processing units (GPUs). An analytical gradient approximation is presented to achieve approximations of projection systems which constitute one most performance critical components of the solution process. Mesh refinement algorithms are discussed and a novel mesh refinement heuristic based on optimal polynomial approximation in an $L^1$ sense is discussed. The $L^1$ approximation is improved by detecting singularities and using Clenshaw--Curtis quadrature on intermediary intervals. In Chapter 4 of this work, the lap time optimisation problem is reformulated as a reinforcement learning environments. For this, the relevant background literature on reinforcement learning is discussed and a translation of a training optimisation environment is constructed. Details of this environment are discussed in the form of reward signals, terminal conditions, and observation features. A series of learning models is discussed with increasing feature fidelity leading to an algorithm that can generalise well across representations of circuits from the 2022 Formula One calendar. This work expands on the current literature by providing novel, physically motivated, reinforcement learning environments for lap time optimisation tasks. The results of both approaches are combined by using strategy extraction to initialise the collocation optimisation algorithm and optimise the underlying mesh.