Comparative evaluation of deep learning workloads for leadership-class systems

Deep learning (DL) workloads and their performance at scale are becoming important factors to consider as we design, develop and deploy next-generation high-performance computing systems. Since DL applications rely heavily on DL frameworks and underlying compute (CPU/GPU) stacks, it is essential to...

Full description

Bibliographic Details
Main Authors:	Junqi Yin, Aristeidis Tsaris, Sajal Dash, Ross Miller, Feiyi Wang, Mallikarjun (Arjun) Shankar
Format:	Article
Language:	English
Published:	KeAi Communications Co. Ltd. 2021-10-01
Series:	BenchCouncil Transactions on Benchmarks, Standards and Evaluations
Subjects:	CORAL benchmark Deep learning stack ROCm
Online Access:	http://www.sciencedirect.com/science/article/pii/S2772485921000053

_version_	1811292596926939136
author	Junqi Yin Aristeidis Tsaris Sajal Dash Ross Miller Feiyi Wang Mallikarjun (Arjun) Shankar
author_facet	Junqi Yin Aristeidis Tsaris Sajal Dash Ross Miller Feiyi Wang Mallikarjun (Arjun) Shankar
author_sort	Junqi Yin
collection	DOAJ
description	Deep learning (DL) workloads and their performance at scale are becoming important factors to consider as we design, develop and deploy next-generation high-performance computing systems. Since DL applications rely heavily on DL frameworks and underlying compute (CPU/GPU) stacks, it is essential to gain a holistic understanding from compute kernels, models, and frameworks of popular DL stacks, and to assess their impact on science-driven, mission-critical applications. At Oak Ridge Leadership Computing Facility (OLCF), we employ a set of micro and macro DL benchmarks established through the Collaboration of Oak Ridge, Argonne, and Livermore (CORAL) to evaluate the AI readiness of our next-generation supercomputers. In this paper, we present our early observations and performance benchmark comparisons between the Nvidia V100 based Summit system with its CUDA stack and an AMD MI100 based testbed system with its ROCm stack. We take a layered perspective on DL benchmarking and point to opportunities for future optimizations in the technologies that we consider.
first_indexed	2024-04-13T04:47:49Z
format	Article
id	doaj.art-523170481d1b483892fdbf719a600830
institution	Directory Open Access Journal
issn	2772-4859
language	English
last_indexed	2024-04-13T04:47:49Z
publishDate	2021-10-01
publisher	KeAi Communications Co. Ltd.
record_format	Article
series	BenchCouncil Transactions on Benchmarks, Standards and Evaluations
spelling	doaj.art-523170481d1b483892fdbf719a6008302022-12-22T03:01:47ZengKeAi Communications Co. Ltd.BenchCouncil Transactions on Benchmarks, Standards and Evaluations2772-48592021-10-0111100005Comparative evaluation of deep learning workloads for leadership-class systemsJunqi Yin0Aristeidis Tsaris1Sajal Dash2Ross Miller3Feiyi Wang4Mallikarjun (Arjun) Shankar5Corresponding author.; Oak Ridge National Laboratory, United States of AmericaOak Ridge National Laboratory, United States of AmericaOak Ridge National Laboratory, United States of AmericaOak Ridge National Laboratory, United States of AmericaOak Ridge National Laboratory, United States of AmericaOak Ridge National Laboratory, United States of AmericaDeep learning (DL) workloads and their performance at scale are becoming important factors to consider as we design, develop and deploy next-generation high-performance computing systems. Since DL applications rely heavily on DL frameworks and underlying compute (CPU/GPU) stacks, it is essential to gain a holistic understanding from compute kernels, models, and frameworks of popular DL stacks, and to assess their impact on science-driven, mission-critical applications. At Oak Ridge Leadership Computing Facility (OLCF), we employ a set of micro and macro DL benchmarks established through the Collaboration of Oak Ridge, Argonne, and Livermore (CORAL) to evaluate the AI readiness of our next-generation supercomputers. In this paper, we present our early observations and performance benchmark comparisons between the Nvidia V100 based Summit system with its CUDA stack and an AMD MI100 based testbed system with its ROCm stack. We take a layered perspective on DL benchmarking and point to opportunities for future optimizations in the technologies that we consider.http://www.sciencedirect.com/science/article/pii/S2772485921000053CORAL benchmarkDeep learning stackROCm
spellingShingle	Junqi Yin Aristeidis Tsaris Sajal Dash Ross Miller Feiyi Wang Mallikarjun (Arjun) Shankar Comparative evaluation of deep learning workloads for leadership-class systems BenchCouncil Transactions on Benchmarks, Standards and Evaluations CORAL benchmark Deep learning stack ROCm
title	Comparative evaluation of deep learning workloads for leadership-class systems
title_full	Comparative evaluation of deep learning workloads for leadership-class systems
title_fullStr	Comparative evaluation of deep learning workloads for leadership-class systems
title_full_unstemmed	Comparative evaluation of deep learning workloads for leadership-class systems
title_short	Comparative evaluation of deep learning workloads for leadership-class systems
title_sort	comparative evaluation of deep learning workloads for leadership class systems
topic	CORAL benchmark Deep learning stack ROCm
url	http://www.sciencedirect.com/science/article/pii/S2772485921000053
work_keys_str_mv	AT junqiyin comparativeevaluationofdeeplearningworkloadsforleadershipclasssystems AT aristeidistsaris comparativeevaluationofdeeplearningworkloadsforleadershipclasssystems AT sajaldash comparativeevaluationofdeeplearningworkloadsforleadershipclasssystems AT rossmiller comparativeevaluationofdeeplearningworkloadsforleadershipclasssystems AT feiyiwang comparativeevaluationofdeeplearningworkloadsforleadershipclasssystems AT mallikarjunarjunshankar comparativeevaluationofdeeplearningworkloadsforleadershipclasssystems

Comparative evaluation of deep learning workloads for leadership-class systems

Similar Items