Optimizing Single DGX-A100 System: Overcoming GPU Limitations via Efficient Parallelism and Scheduling for Large Language Models
In this study, we introduce a novel training algorithm specifically designed to overcome the limitations of GPU memory on a single DGX-A100 system. By utilizing the CPU and main memory in the training process and applying a strategy of division and parallelization, our algorithm enhances the size of...
Main Authors: | Kyeong-Hwan Kim, Chang-Sung Jeong |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-08-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/13/16/9306 |
Similar Items
-
DGX-A100 Face to Face DGX-2—Performance, Power and Thermal Behavior Evaluation
by: Matej Špeťko, et al.
Published: (2021-01-01) -
Development of a CPU-GPU heterogeneous platform based on a nonlinear parallel algorithm
by: Ma Haifeng
Published: (2022-06-01) -
Parallel computing : from multicores and GPU's to Petascale /
by: ParCo 2009 (2009 : Lyon, France), et al.
Published: (c201) -
Dynamic SIMD Parallel Execution on GPU from High-Level Dataflow Synthesis
by: Aurelien Bloch, et al.
Published: (2022-07-01) -
Real-Time Simulation and Optimization of Elastic Aircraft Vehicle Based on Multi-GPU Workstation
by: Binxing Hu, et al.
Published: (2019-01-01)