Analysis of online learning algorithms in machine learning

In this thesis, we consider the problem that optimizes the parameter in the stationary distribution of markov decision process, stochastic differential equations (SDEs) and stochastic partial differential equations (SPDEs). First, we study the online Actor-critic algorithms in Reinforcement Learning...

Full description

Bibliographic Details
Main Author: Wang, Z
Other Authors: Sirignano, J
Format: Thesis
Language:English
Published: 2024
Subjects:
Description
Summary:In this thesis, we consider the problem that optimizes the parameter in the stationary distribution of markov decision process, stochastic differential equations (SDEs) and stochastic partial differential equations (SPDEs). First, we study the online Actor-critic algorithms in Reinforcement Learning with tabular parametrization and prove that, under a time rescaling, the algorithm converges to ordinary differential equations (ODEs) as the number of updates becomes large. The convergence and convergence rate to the optimal strategies are given by using a two time-scale analysis which asymptotically decouples the critic ODE from the actor ODE. Next, under the same framework, we show that when both the actor and critic are parameterized by single-layer neural networks, the Actor-critic algorithm will converge in distribution to a system of ODEs with random initial conditions as the number of hidden units and the number of training steps goes to infinity. The convergence to a stationary point of the limit actor network is also established. Further, we develop a new continuous-time stochastic gradient descent method for optimizing over the stationary distribution of SDE models. The novel idea of our algorithm is that the gradient estimate is simultaneously updated using forward propagation of the SDE state derivatives, which asymptotically converges to the direction of steepest descent. We rigorously prove convergence of the online forward propagation algorithm for linear SDE models and present its numerical results to a range of mathematical finance applications. Finally, we establish the convergence of our algorithm for a class of nonlinear dissipative SDEs whose drift and volatility functions both depend upon the parameters which are being optimized. We also show the application of our algorithm in Neural SPDEs.