Efficient Use of GPU Memory for Large-Scale Deep Learning Model Training

To achieve high accuracy when performing deep learning, it is necessary to use a large-scale training model. However, due to the limitations of GPU memory, it is difficult to train large-scale training models within a single GPU. NVIDIA introduced a technology called CUDA Unified Memory with CUDA 6...

Full description

Bibliographic Details
Main Authors:	Hyeonseong Choi, Jaehwan Lee
Format:	Article
Language:	English
Published:	MDPI AG 2021-11-01
Series:	Applied Sciences
Subjects:	deep learning large-scale model CUDA Unified Memory PyTorch
Online Access:	https://www.mdpi.com/2076-3417/11/21/10377

_version_	1797512790046408704
author	Hyeonseong Choi Jaehwan Lee
author_facet	Hyeonseong Choi Jaehwan Lee
author_sort	Hyeonseong Choi
collection	DOAJ
description	To achieve high accuracy when performing deep learning, it is necessary to use a large-scale training model. However, due to the limitations of GPU memory, it is difficult to train large-scale training models within a single GPU. NVIDIA introduced a technology called CUDA Unified Memory with CUDA 6 to overcome the limitations of GPU memory by virtually combining GPU memory and CPU memory. In addition, in CUDA 8, memory advise options are introduced to efficiently utilize CUDA Unified Memory. In this work, we propose a newly optimized scheme based on CUDA Unified Memory to efficiently use GPU memory by applying different memory advise to each data type according to access patterns in deep learning training. We apply CUDA Unified Memory technology to PyTorch to see the performance of large-scale learning models through the expanded GPU memory. We conduct comprehensive experiments on how to efficiently utilize Unified Memory by applying memory advises when performing deep learning. As a result, when the data used for deep learning are divided into three types and a memory advise is applied to the data according to the access pattern, the deep learning execution time is reduced by 9.4% compared to the default Unified Memory.
first_indexed	2024-03-10T06:05:39Z
format	Article
id	doaj.art-86e2e525a3c74baf80b24bf608c75dbb
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-10T06:05:39Z
publishDate	2021-11-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-86e2e525a3c74baf80b24bf608c75dbb2023-11-22T20:31:55ZengMDPI AGApplied Sciences2076-34172021-11-0111211037710.3390/app112110377Efficient Use of GPU Memory for Large-Scale Deep Learning Model TrainingHyeonseong Choi0Jaehwan Lee1School of Electronics and Information Engineering, Korea Aerospace University, Goyang-si 10540, KoreaSchool of Electronics and Information Engineering, Korea Aerospace University, Goyang-si 10540, KoreaTo achieve high accuracy when performing deep learning, it is necessary to use a large-scale training model. However, due to the limitations of GPU memory, it is difficult to train large-scale training models within a single GPU. NVIDIA introduced a technology called CUDA Unified Memory with CUDA 6 to overcome the limitations of GPU memory by virtually combining GPU memory and CPU memory. In addition, in CUDA 8, memory advise options are introduced to efficiently utilize CUDA Unified Memory. In this work, we propose a newly optimized scheme based on CUDA Unified Memory to efficiently use GPU memory by applying different memory advise to each data type according to access patterns in deep learning training. We apply CUDA Unified Memory technology to PyTorch to see the performance of large-scale learning models through the expanded GPU memory. We conduct comprehensive experiments on how to efficiently utilize Unified Memory by applying memory advises when performing deep learning. As a result, when the data used for deep learning are divided into three types and a memory advise is applied to the data according to the access pattern, the deep learning execution time is reduced by 9.4% compared to the default Unified Memory.https://www.mdpi.com/2076-3417/11/21/10377deep learninglarge-scale modelCUDA Unified MemoryPyTorch
spellingShingle	Hyeonseong Choi Jaehwan Lee Efficient Use of GPU Memory for Large-Scale Deep Learning Model Training Applied Sciences deep learning large-scale model CUDA Unified Memory PyTorch
title	Efficient Use of GPU Memory for Large-Scale Deep Learning Model Training
title_full	Efficient Use of GPU Memory for Large-Scale Deep Learning Model Training
title_fullStr	Efficient Use of GPU Memory for Large-Scale Deep Learning Model Training
title_full_unstemmed	Efficient Use of GPU Memory for Large-Scale Deep Learning Model Training
title_short	Efficient Use of GPU Memory for Large-Scale Deep Learning Model Training
title_sort	efficient use of gpu memory for large scale deep learning model training
topic	deep learning large-scale model CUDA Unified Memory PyTorch
url	https://www.mdpi.com/2076-3417/11/21/10377
work_keys_str_mv	AT hyeonseongchoi efficientuseofgpumemoryforlargescaledeeplearningmodeltraining AT jaehwanlee efficientuseofgpumemoryforlargescaledeeplearningmodeltraining

Efficient Use of GPU Memory for Large-Scale Deep Learning Model Training

Similar Items