Towards Effective Theories for Deep Learning and Beyond

Deep learning has been quite successful in the past decade, with fantastic progress made across multiple domains such as computer vision, natural language processing, and Reinforcement learning. However, the theoretical understanding of its success is limited, and its behavior constantly defies our...

Full description

Bibliographic Details
Main Author: Zhai, Xiyu
Other Authors: Rakhlin, Alexander (Sasha)
Format: Thesis
Published: Massachusetts Institute of Technology 2024
Online Access:https://hdl.handle.net/1721.1/156548
Description
Summary:Deep learning has been quite successful in the past decade, with fantastic progress made across multiple domains such as computer vision, natural language processing, and Reinforcement learning. However, the theoretical understanding of its success is limited, and its behavior constantly defies our traditional theoretical understanding of machine learning. We will present our work on deep learning theory and show powerful techniques we developed for studying wide neural networks’ optimization and generalization behavior that help narrow the gap between theory and practice. Inspired by the recent success of transformer architectures like ChatGPT and SORA, we would like to present our work on the expressive power of vision transformers. Besides, we have been working on new AI theories and algorithms that go beyond deep learning and a new AI programming system that aims to make the implementation of these new ideas possible. It goes beyond the scope of a PhD to finish a demonstration of the new AI, but we aim to show strong evidence of its feasibility.