Load balancing and memory optimizations for expert parallel training of large language models
Large language models (LLMs) are an effective way to solve many text-based machine learning tasks, but require huge amounts of computation to train and evaluate. Mixture of experts models have emerged as a way to reduce the amount of computation required for LLMs without compromising accuracy. It is...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2024
|
Online Access: | https://hdl.handle.net/1721.1/153897 |