Optimizing Single DGX-A100 System: Overcoming GPU Limitations via Efficient Parallelism and Scheduling for Large Language Models

In this study, we introduce a novel training algorithm specifically designed to overcome the limitations of GPU memory on a single DGX-A100 system. By utilizing the CPU and main memory in the training process and applying a strategy of division and parallelization, our algorithm enhances the size of...

Full description

Bibliographic Details
Main Authors:	Kyeong-Hwan Kim, Chang-Sung Jeong
Format:	Article
Language:	English
Published:	MDPI AG 2023-08-01
Series:	Applied Sciences
Subjects:	heterogeneous systems natural language processing model parallelism
Online Access:	https://www.mdpi.com/2076-3417/13/16/9306

_version_	1797585646664024064
author	Kyeong-Hwan Kim Chang-Sung Jeong
author_facet	Kyeong-Hwan Kim Chang-Sung Jeong
author_sort	Kyeong-Hwan Kim
collection	DOAJ
description	In this study, we introduce a novel training algorithm specifically designed to overcome the limitations of GPU memory on a single DGX-A100 system. By utilizing the CPU and main memory in the training process and applying a strategy of division and parallelization, our algorithm enhances the size of the trainable language model and the batch size. In addition, we developed a comprehensive management system to effectively manage the execution of the algorithm. This system systematically controls the training process and resource usage, while also enabling the asynchronous deployment of tasks. Finally, we proposed a scheduling technique integrated into the management system, promoting efficient task scheduling in a complex, heterogeneous training environment. These advancements equip researchers with the ability to work with larger models and batch sizes, even when faced with limited GPU memory.
first_indexed	2024-03-11T00:10:02Z
format	Article
id	doaj.art-e9ed207289ce44d78321e3b683044dab
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-11T00:10:02Z
publishDate	2023-08-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-e9ed207289ce44d78321e3b683044dab2023-11-19T00:07:36ZengMDPI AGApplied Sciences2076-34172023-08-011316930610.3390/app13169306Optimizing Single DGX-A100 System: Overcoming GPU Limitations via Efficient Parallelism and Scheduling for Large Language ModelsKyeong-Hwan Kim0Chang-Sung Jeong1Department of Electrical Engineering, Korea University, Seoul 02841, Republic of KoreaDepartment of Electrical Engineering, Korea University, Seoul 02841, Republic of KoreaIn this study, we introduce a novel training algorithm specifically designed to overcome the limitations of GPU memory on a single DGX-A100 system. By utilizing the CPU and main memory in the training process and applying a strategy of division and parallelization, our algorithm enhances the size of the trainable language model and the batch size. In addition, we developed a comprehensive management system to effectively manage the execution of the algorithm. This system systematically controls the training process and resource usage, while also enabling the asynchronous deployment of tasks. Finally, we proposed a scheduling technique integrated into the management system, promoting efficient task scheduling in a complex, heterogeneous training environment. These advancements equip researchers with the ability to work with larger models and batch sizes, even when faced with limited GPU memory.https://www.mdpi.com/2076-3417/13/16/9306heterogeneous systemsnatural language processingmodel parallelism
spellingShingle	Kyeong-Hwan Kim Chang-Sung Jeong Optimizing Single DGX-A100 System: Overcoming GPU Limitations via Efficient Parallelism and Scheduling for Large Language Models Applied Sciences heterogeneous systems natural language processing model parallelism
title	Optimizing Single DGX-A100 System: Overcoming GPU Limitations via Efficient Parallelism and Scheduling for Large Language Models
title_full	Optimizing Single DGX-A100 System: Overcoming GPU Limitations via Efficient Parallelism and Scheduling for Large Language Models
title_fullStr	Optimizing Single DGX-A100 System: Overcoming GPU Limitations via Efficient Parallelism and Scheduling for Large Language Models
title_full_unstemmed	Optimizing Single DGX-A100 System: Overcoming GPU Limitations via Efficient Parallelism and Scheduling for Large Language Models
title_short	Optimizing Single DGX-A100 System: Overcoming GPU Limitations via Efficient Parallelism and Scheduling for Large Language Models
title_sort	optimizing single dgx a100 system overcoming gpu limitations via efficient parallelism and scheduling for large language models
topic	heterogeneous systems natural language processing model parallelism
url	https://www.mdpi.com/2076-3417/13/16/9306
work_keys_str_mv	AT kyeonghwankim optimizingsingledgxa100systemovercominggpulimitationsviaefficientparallelismandschedulingforlargelanguagemodels AT changsungjeong optimizingsingledgxa100systemovercominggpulimitationsviaefficientparallelismandschedulingforlargelanguagemodels

Optimizing Single DGX-A100 System: Overcoming GPU Limitations via Efficient Parallelism and Scheduling for Large Language Models

Similar Items