Compositionality and functorial invariants in machine learning

<p>The objective of this thesis is to show that studying the underlying compositional and functorial structure in machine learning systems allows us to better understand them. In order to do this, we explore category theoretic formulations of many subareas of machine learning, including optimi...

Full description

Bibliographic Details
Main Author: Shiebler, D
Other Authors: Gibbons, J
Format: Thesis
Language:English
Published: 2023
Subjects:
Description
Summary:<p>The objective of this thesis is to show that studying the underlying compositional and functorial structure in machine learning systems allows us to better understand them. In order to do this, we explore category theoretic formulations of many subareas of machine learning, including optimization, probability, unsupervised learning, and supervised learning.</p> <p>We begin with an investigation of how various optimization algorithms behave when we replace the gradient with a generic category theoretic structure. We prove that the key properties of these algorithms hold under very relaxed assumptions, and demonstrate this result through numerical experiments. We also explore a category theoretic perspective on dynamical systems that enables us to build powerful optimizers from the composition of simple operations.</p> <p>Next, we take a category theoretic perspective on the relationship between probabilistic modeling and gradient based optimization. We use this perspective to study how maximum likelihood estimation preserves certain key structures in the transformation from a statistical model to a supervised learning algorithm.</p> <p>Next, we take a functorial perspective on unsupervised learning. We develop taxonomies of unsupervised learning algorithms based on the category theoretic properties of their functorial representations, and demonstrate that these taxonomies are predictive of algorithm behavior. We use this perspective to derive a host of new unsupervised learning algorithms for clustering and manifold learning, and demonstrate that these new algorithms can outperform commonly used alternatives on real world data. We also use these tools to prove new results on the behavior and limitations of popular unsupervised learning algorithms, including refinement bounds and stability in the face of noise.</p> <p>Finally, we turn to supervised learning and demonstrate that many of the most common problems in data science and machine learning can be expressed as Kan extensions. We use this perspective to derive novel classification and supervised clustering algorithms. We also explore the performance of these algorithms on real data.</p>