Optimizing Single DGX-A100 System: Overcoming GPU Limitations via Efficient Parallelism and Scheduling for Large Language Models
In this study, we introduce a novel training algorithm specifically designed to overcome the limitations of GPU memory on a single DGX-A100 system. By utilizing the CPU and main memory in the training process and applying a strategy of division and parallelization, our algorithm enhances the size of...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-08-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/13/16/9306 |
_version_ | 1797585646664024064 |
---|---|
author | Kyeong-Hwan Kim Chang-Sung Jeong |
author_facet | Kyeong-Hwan Kim Chang-Sung Jeong |
author_sort | Kyeong-Hwan Kim |
collection | DOAJ |
description | In this study, we introduce a novel training algorithm specifically designed to overcome the limitations of GPU memory on a single DGX-A100 system. By utilizing the CPU and main memory in the training process and applying a strategy of division and parallelization, our algorithm enhances the size of the trainable language model and the batch size. In addition, we developed a comprehensive management system to effectively manage the execution of the algorithm. This system systematically controls the training process and resource usage, while also enabling the asynchronous deployment of tasks. Finally, we proposed a scheduling technique integrated into the management system, promoting efficient task scheduling in a complex, heterogeneous training environment. These advancements equip researchers with the ability to work with larger models and batch sizes, even when faced with limited GPU memory. |
first_indexed | 2024-03-11T00:10:02Z |
format | Article |
id | doaj.art-e9ed207289ce44d78321e3b683044dab |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-11T00:10:02Z |
publishDate | 2023-08-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-e9ed207289ce44d78321e3b683044dab2023-11-19T00:07:36ZengMDPI AGApplied Sciences2076-34172023-08-011316930610.3390/app13169306Optimizing Single DGX-A100 System: Overcoming GPU Limitations via Efficient Parallelism and Scheduling for Large Language ModelsKyeong-Hwan Kim0Chang-Sung Jeong1Department of Electrical Engineering, Korea University, Seoul 02841, Republic of KoreaDepartment of Electrical Engineering, Korea University, Seoul 02841, Republic of KoreaIn this study, we introduce a novel training algorithm specifically designed to overcome the limitations of GPU memory on a single DGX-A100 system. By utilizing the CPU and main memory in the training process and applying a strategy of division and parallelization, our algorithm enhances the size of the trainable language model and the batch size. In addition, we developed a comprehensive management system to effectively manage the execution of the algorithm. This system systematically controls the training process and resource usage, while also enabling the asynchronous deployment of tasks. Finally, we proposed a scheduling technique integrated into the management system, promoting efficient task scheduling in a complex, heterogeneous training environment. These advancements equip researchers with the ability to work with larger models and batch sizes, even when faced with limited GPU memory.https://www.mdpi.com/2076-3417/13/16/9306heterogeneous systemsnatural language processingmodel parallelism |
spellingShingle | Kyeong-Hwan Kim Chang-Sung Jeong Optimizing Single DGX-A100 System: Overcoming GPU Limitations via Efficient Parallelism and Scheduling for Large Language Models Applied Sciences heterogeneous systems natural language processing model parallelism |
title | Optimizing Single DGX-A100 System: Overcoming GPU Limitations via Efficient Parallelism and Scheduling for Large Language Models |
title_full | Optimizing Single DGX-A100 System: Overcoming GPU Limitations via Efficient Parallelism and Scheduling for Large Language Models |
title_fullStr | Optimizing Single DGX-A100 System: Overcoming GPU Limitations via Efficient Parallelism and Scheduling for Large Language Models |
title_full_unstemmed | Optimizing Single DGX-A100 System: Overcoming GPU Limitations via Efficient Parallelism and Scheduling for Large Language Models |
title_short | Optimizing Single DGX-A100 System: Overcoming GPU Limitations via Efficient Parallelism and Scheduling for Large Language Models |
title_sort | optimizing single dgx a100 system overcoming gpu limitations via efficient parallelism and scheduling for large language models |
topic | heterogeneous systems natural language processing model parallelism |
url | https://www.mdpi.com/2076-3417/13/16/9306 |
work_keys_str_mv | AT kyeonghwankim optimizingsingledgxa100systemovercominggpulimitationsviaefficientparallelismandschedulingforlargelanguagemodels AT changsungjeong optimizingsingledgxa100systemovercominggpulimitationsviaefficientparallelismandschedulingforlargelanguagemodels |