Load balancing and memory optimizations for expert parallel training of large language models

Large language models (LLMs) are an effective way to solve many text-based machine learning tasks, but require huge amounts of computation to train and evaluate. Mixture of experts models have emerged as a way to reduce the amount of computation required for LLMs without compromising accuracy. It is...

Full description

Bibliographic Details
Main Author: Wisdom, Daniel
Other Authors: Leiserson, Charles E.
Format: Thesis
Published: Massachusetts Institute of Technology 2024
Online Access:https://hdl.handle.net/1721.1/153897