Comparative evaluation of deep learning workloads for leadership-class systems

Deep learning (DL) workloads and their performance at scale are becoming important factors to consider as we design, develop and deploy next-generation high-performance computing systems. Since DL applications rely heavily on DL frameworks and underlying compute (CPU/GPU) stacks, it is essential to...

Full description

Bibliographic Details
Main Authors: Junqi Yin, Aristeidis Tsaris, Sajal Dash, Ross Miller, Feiyi Wang, Mallikarjun (Arjun) Shankar
Format: Article
Language:English
Published: KeAi Communications Co. Ltd. 2021-10-01
Series:BenchCouncil Transactions on Benchmarks, Standards and Evaluations
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2772485921000053
_version_ 1811292596926939136
author Junqi Yin
Aristeidis Tsaris
Sajal Dash
Ross Miller
Feiyi Wang
Mallikarjun (Arjun) Shankar
author_facet Junqi Yin
Aristeidis Tsaris
Sajal Dash
Ross Miller
Feiyi Wang
Mallikarjun (Arjun) Shankar
author_sort Junqi Yin
collection DOAJ
description Deep learning (DL) workloads and their performance at scale are becoming important factors to consider as we design, develop and deploy next-generation high-performance computing systems. Since DL applications rely heavily on DL frameworks and underlying compute (CPU/GPU) stacks, it is essential to gain a holistic understanding from compute kernels, models, and frameworks of popular DL stacks, and to assess their impact on science-driven, mission-critical applications. At Oak Ridge Leadership Computing Facility (OLCF), we employ a set of micro and macro DL benchmarks established through the Collaboration of Oak Ridge, Argonne, and Livermore (CORAL) to evaluate the AI readiness of our next-generation supercomputers. In this paper, we present our early observations and performance benchmark comparisons between the Nvidia V100 based Summit system with its CUDA stack and an AMD MI100 based testbed system with its ROCm stack. We take a layered perspective on DL benchmarking and point to opportunities for future optimizations in the technologies that we consider.
first_indexed 2024-04-13T04:47:49Z
format Article
id doaj.art-523170481d1b483892fdbf719a600830
institution Directory Open Access Journal
issn 2772-4859
language English
last_indexed 2024-04-13T04:47:49Z
publishDate 2021-10-01
publisher KeAi Communications Co. Ltd.
record_format Article
series BenchCouncil Transactions on Benchmarks, Standards and Evaluations
spelling doaj.art-523170481d1b483892fdbf719a6008302022-12-22T03:01:47ZengKeAi Communications Co. Ltd.BenchCouncil Transactions on Benchmarks, Standards and Evaluations2772-48592021-10-0111100005Comparative evaluation of deep learning workloads for leadership-class systemsJunqi Yin0Aristeidis Tsaris1Sajal Dash2Ross Miller3Feiyi Wang4Mallikarjun (Arjun) Shankar5Corresponding author.; Oak Ridge National Laboratory, United States of AmericaOak Ridge National Laboratory, United States of AmericaOak Ridge National Laboratory, United States of AmericaOak Ridge National Laboratory, United States of AmericaOak Ridge National Laboratory, United States of AmericaOak Ridge National Laboratory, United States of AmericaDeep learning (DL) workloads and their performance at scale are becoming important factors to consider as we design, develop and deploy next-generation high-performance computing systems. Since DL applications rely heavily on DL frameworks and underlying compute (CPU/GPU) stacks, it is essential to gain a holistic understanding from compute kernels, models, and frameworks of popular DL stacks, and to assess their impact on science-driven, mission-critical applications. At Oak Ridge Leadership Computing Facility (OLCF), we employ a set of micro and macro DL benchmarks established through the Collaboration of Oak Ridge, Argonne, and Livermore (CORAL) to evaluate the AI readiness of our next-generation supercomputers. In this paper, we present our early observations and performance benchmark comparisons between the Nvidia V100 based Summit system with its CUDA stack and an AMD MI100 based testbed system with its ROCm stack. We take a layered perspective on DL benchmarking and point to opportunities for future optimizations in the technologies that we consider.http://www.sciencedirect.com/science/article/pii/S2772485921000053CORAL benchmarkDeep learning stackROCm
spellingShingle Junqi Yin
Aristeidis Tsaris
Sajal Dash
Ross Miller
Feiyi Wang
Mallikarjun (Arjun) Shankar
Comparative evaluation of deep learning workloads for leadership-class systems
BenchCouncil Transactions on Benchmarks, Standards and Evaluations
CORAL benchmark
Deep learning stack
ROCm
title Comparative evaluation of deep learning workloads for leadership-class systems
title_full Comparative evaluation of deep learning workloads for leadership-class systems
title_fullStr Comparative evaluation of deep learning workloads for leadership-class systems
title_full_unstemmed Comparative evaluation of deep learning workloads for leadership-class systems
title_short Comparative evaluation of deep learning workloads for leadership-class systems
title_sort comparative evaluation of deep learning workloads for leadership class systems
topic CORAL benchmark
Deep learning stack
ROCm
url http://www.sciencedirect.com/science/article/pii/S2772485921000053
work_keys_str_mv AT junqiyin comparativeevaluationofdeeplearningworkloadsforleadershipclasssystems
AT aristeidistsaris comparativeevaluationofdeeplearningworkloadsforleadershipclasssystems
AT sajaldash comparativeevaluationofdeeplearningworkloadsforleadershipclasssystems
AT rossmiller comparativeevaluationofdeeplearningworkloadsforleadershipclasssystems
AT feiyiwang comparativeevaluationofdeeplearningworkloadsforleadershipclasssystems
AT mallikarjunarjunshankar comparativeevaluationofdeeplearningworkloadsforleadershipclasssystems