Summary: | Lap simulation in a Formula One context is a subclass of optimal control
problems and describes the computation of optimal trajectories around
racing circuits. The results of lap simulation are primarily used
for vehicle setup and strategic racing decisions. The optimal
lap problem is solved using two classes of algorithms. The first algorithm
uses direct collocation to compute optimal trajectories and the second
algorithm uses specially constructed reinforcement learning environments
and generalised function approximation to compute desirable system inputs.
Historically direct collocation methods were considered impractical for minimum
lap time simulations, due to their high computational costs. The
exponential increase in computational performance has enabled the practical
application of these algorithms.
These lap time simulations require a vehicle model, as well as a track
discretisation. As an example for this, the classical bicycle model along with
a curvilinear track model are introduced. To solve the resulting direct
collocation problems, algorithms for non-linear optimisation problems are
presented and performance critical aspects are discussed. The optimisation
algorithm is accelerated by utilising highly parallel computer architectures,
such as graphics processing units (GPUs). An analytical gradient approximation
is presented to achieve approximations of projection systems which constitute
one most performance critical components of the solution process. Mesh
refinement algorithms are discussed and a novel mesh refinement heuristic based
on optimal polynomial approximation in an $L^1$ sense is discussed. The $L^1$
approximation is improved by detecting singularities and using Clenshaw--Curtis
quadrature on intermediary intervals.
In Chapter 4 of this work, the lap time optimisation
problem is reformulated as a reinforcement learning environments. For this,
the relevant background literature on reinforcement learning is discussed
and a translation of a training optimisation environment is constructed.
Details of this environment are discussed in the form of reward signals,
terminal conditions, and observation features. A series of learning models
is discussed with increasing feature fidelity leading to an algorithm that
can generalise well across representations of circuits from the 2022 Formula One
calendar. This work expands on the current literature by providing novel, physically
motivated, reinforcement learning environments for lap time optimisation tasks.
The results of both approaches are combined by using strategy extraction to
initialise the collocation optimisation algorithm and optimise the underlying mesh.
|