Benchmarking GPU Tensor Cores on General Matrix Multiplication Kernels through CUTLASS
GPUs have been broadly used to accelerate big data analytics, scientific computing and machine intelligence. Particularly, matrix multiplication and convolution are two principal operations that use a large proportion of steps in modern data analysis and deep neural networks. These performance-criti...
Main Authors: | Xuanteng Huang, Xianwei Zhang, Panfei Yang, Nong Xiao |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-12-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/13/24/13022 |
Similar Items
-
Prospects of GPU Tensor Core Correlation for the SMA and the ngEHT
by: Wei Yu, et al.
Published: (2023-01-01) -
OpSparse: A Highly Optimized Framework for Sparse General Matrix Multiplication on GPUs
by: Zhaoyang Du, et al.
Published: (2022-01-01) -
Numerical behavior of NVIDIA tensor cores
by: Massimiliano Fasi, et al.
Published: (2021-02-01) -
RayBench: An Advanced NVIDIA-Centric GPU Rendering Benchmark Suite for Optimal Performance Analysis
by: Peng Wang, et al.
Published: (2023-10-01) -
An Approximate GEMM Unit for Energy-Efficient Object Detection
by: Ratko Pilipović, et al.
Published: (2021-06-01)