Deep Learning Performance Characterization on GPUs for Various Quantization Frameworks

Deep learning is employed in many applications, such as computer vision, natural language processing, robotics, and recommender systems. Large and complex neural networks lead to high accuracy; however, they adversely affect many aspects of deep learning performance, such as training time, latency,...

Full description

Bibliographic Details
Main Authors:	Muhammad Ali Shafique, Arslan Munir, Joonho Kong
Format:	Article
Language:	English
Published:	MDPI AG 2023-10-01
Series:	AI
Subjects:	optimization deep learning quantization performance TensorRT automatic mixed precision
Online Access:	https://www.mdpi.com/2673-2688/4/4/47

_version_	1797382294165520384
author	Muhammad Ali Shafique Arslan Munir Joonho Kong
author_facet	Muhammad Ali Shafique Arslan Munir Joonho Kong
author_sort	Muhammad Ali Shafique
collection	DOAJ
description	Deep learning is employed in many applications, such as computer vision, natural language processing, robotics, and recommender systems. Large and complex neural networks lead to high accuracy; however, they adversely affect many aspects of deep learning performance, such as training time, latency, throughput, energy consumption, and memory usage in the training and inference stages. To solve these challenges, various optimization techniques and frameworks have been developed for the efficient performance of deep learning models in the training and inference stages. Although optimization techniques such as quantization have been studied thoroughly in the past, less work has been done to study the performance of frameworks that provide quantization techniques. In this paper, we have used different performance metrics to study the performance of various quantization frameworks, including TensorFlow automatic mixed precision and TensorRT. These performance metrics include training time and memory utilization in the training stage along with latency and throughput for graphics processing units (GPUs) in the inference stage. We have applied the automatic mixed precision (AMP) technique during the training stage using the TensorFlow framework, while for inference we have utilized the TensorRT framework for the post-training quantization technique using the TensorFlow TensorRT (TF-TRT) application programming interface (API).We performed model profiling for different deep learning models, datasets, image sizes, and batch sizes for both the training and inference stages, the results of which can help developers and researchers to devise and deploy efficient deep learning models for GPUs.
first_indexed	2024-03-08T21:03:22Z
format	Article
id	doaj.art-3dcb79f22b474cb49f0c16efbf19005c
institution	Directory Open Access Journal
issn	2673-2688
language	English
last_indexed	2024-03-08T21:03:22Z
publishDate	2023-10-01
publisher	MDPI AG
record_format	Article
series	AI
spelling	doaj.art-3dcb79f22b474cb49f0c16efbf19005c2023-12-22T13:46:57ZengMDPI AGAI2673-26882023-10-014492694810.3390/ai4040047Deep Learning Performance Characterization on GPUs for Various Quantization FrameworksMuhammad Ali Shafique0Arslan Munir1Joonho Kong2Department of Electrical and Computer Engineering, Kansas State University, Manhattan, KS 66506, USADepartment of Computer Science, Kansas State University, Manhattan, KS 66506, USASchool of Electronic and Electrical Engineering, Kyungpook National University, Daegu 41566, Republic of KoreaDeep learning is employed in many applications, such as computer vision, natural language processing, robotics, and recommender systems. Large and complex neural networks lead to high accuracy; however, they adversely affect many aspects of deep learning performance, such as training time, latency, throughput, energy consumption, and memory usage in the training and inference stages. To solve these challenges, various optimization techniques and frameworks have been developed for the efficient performance of deep learning models in the training and inference stages. Although optimization techniques such as quantization have been studied thoroughly in the past, less work has been done to study the performance of frameworks that provide quantization techniques. In this paper, we have used different performance metrics to study the performance of various quantization frameworks, including TensorFlow automatic mixed precision and TensorRT. These performance metrics include training time and memory utilization in the training stage along with latency and throughput for graphics processing units (GPUs) in the inference stage. We have applied the automatic mixed precision (AMP) technique during the training stage using the TensorFlow framework, while for inference we have utilized the TensorRT framework for the post-training quantization technique using the TensorFlow TensorRT (TF-TRT) application programming interface (API).We performed model profiling for different deep learning models, datasets, image sizes, and batch sizes for both the training and inference stages, the results of which can help developers and researchers to devise and deploy efficient deep learning models for GPUs.https://www.mdpi.com/2673-2688/4/4/47optimizationdeep learningquantizationperformanceTensorRTautomatic mixed precision
spellingShingle	Muhammad Ali Shafique Arslan Munir Joonho Kong Deep Learning Performance Characterization on GPUs for Various Quantization Frameworks AI optimization deep learning quantization performance TensorRT automatic mixed precision
title	Deep Learning Performance Characterization on GPUs for Various Quantization Frameworks
title_full	Deep Learning Performance Characterization on GPUs for Various Quantization Frameworks
title_fullStr	Deep Learning Performance Characterization on GPUs for Various Quantization Frameworks
title_full_unstemmed	Deep Learning Performance Characterization on GPUs for Various Quantization Frameworks
title_short	Deep Learning Performance Characterization on GPUs for Various Quantization Frameworks
title_sort	deep learning performance characterization on gpus for various quantization frameworks
topic	optimization deep learning quantization performance TensorRT automatic mixed precision
url	https://www.mdpi.com/2673-2688/4/4/47
work_keys_str_mv	AT muhammadalishafique deeplearningperformancecharacterizationongpusforvariousquantizationframeworks AT arslanmunir deeplearningperformancecharacterizationongpusforvariousquantizationframeworks AT joonhokong deeplearningperformancecharacterizationongpusforvariousquantizationframeworks

Deep Learning Performance Characterization on GPUs for Various Quantization Frameworks

Similar Items