Fluid simulations accelerated with 16 bits: Approaching 4x speedup on A64FX by squeezing ShallowWaters.jl into Float16

Most Earth-system simulations run on conventional central processing units in 64-bit double precision floating-point numbers Float64, although the need for high-precision calculations in the presence of large uncertainties has been questioned. Fugaku, currently the world's fastest supercomputer...

Full description

Bibliographic Details
Main Authors:	Klöwer, M, Hatfield, S, Croci, M, Düben, PD, Palmer, TN
Format:	Journal article
Language:	English
Published:	Wiley 2022

_version_	1797099430094045184
author	Klöwer, M Hatfield, S Croci, M Düben, PD Palmer, TN
author_facet	Klöwer, M Hatfield, S Croci, M Düben, PD Palmer, TN
author_sort	Klöwer, M
collection	OXFORD
description	Most Earth-system simulations run on conventional central processing units in 64-bit double precision floating-point numbers Float64, although the need for high-precision calculations in the presence of large uncertainties has been questioned. Fugaku, currently the world's fastest supercomputer, is based on A64FX microprocessors, which also support the 16-bit low-precision format Float16. We investigate the Float16 performance on A64FX with ShallowWaters.jl, the first fluid circulation model that runs entirely with 16-bit arithmetic. The model implements techniques that address precision and dynamic range issues in 16 bits. The precision-critical time integration is augmented to include compensated summation to minimize rounding errors. Such a compensated time integration is as precise but faster than mixed precision with 16 and 32-bit floats. As subnormals are inefficiently supported on A64FX the very limited range available in Float16 is 6 × 10−5 to 65,504. We develop the analysis-number format Sherlogs.jl to log the arithmetic results during the simulation. The equations in ShallowWaters.jl are then systematically rescaled to fit into Float16, using 97% of the available representable numbers. Consequently, we benchmark speedups of up to 3.8x on A64FX with Float16. Adding a compensated time integration, speedups reach up to 3.6x. Although ShallowWaters.jl is simplified compared to large Earth-system models, it shares essential algorithms and therefore shows that 16-bit calculations are indeed a competitive way to accelerate Earth-system simulations on available hardware.
first_indexed	2024-03-07T05:23:35Z
format	Journal article
id	oxford-uuid:dfc6648e-f50f-43ae-98dd-e92bb3482059
institution	University of Oxford
language	English
last_indexed	2024-03-07T05:23:35Z
publishDate	2022
publisher	Wiley
record_format	dspace
spelling	oxford-uuid:dfc6648e-f50f-43ae-98dd-e92bb34820592022-03-27T09:41:52ZFluid simulations accelerated with 16 bits: Approaching 4x speedup on A64FX by squeezing ShallowWaters.jl into Float16Journal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:dfc6648e-f50f-43ae-98dd-e92bb3482059EnglishSymplectic ElementsWiley2022Klöwer, MHatfield, SCroci, MDüben, PDPalmer, TNMost Earth-system simulations run on conventional central processing units in 64-bit double precision floating-point numbers Float64, although the need for high-precision calculations in the presence of large uncertainties has been questioned. Fugaku, currently the world's fastest supercomputer, is based on A64FX microprocessors, which also support the 16-bit low-precision format Float16. We investigate the Float16 performance on A64FX with ShallowWaters.jl, the first fluid circulation model that runs entirely with 16-bit arithmetic. The model implements techniques that address precision and dynamic range issues in 16 bits. The precision-critical time integration is augmented to include compensated summation to minimize rounding errors. Such a compensated time integration is as precise but faster than mixed precision with 16 and 32-bit floats. As subnormals are inefficiently supported on A64FX the very limited range available in Float16 is 6 × 10−5 to 65,504. We develop the analysis-number format Sherlogs.jl to log the arithmetic results during the simulation. The equations in ShallowWaters.jl are then systematically rescaled to fit into Float16, using 97% of the available representable numbers. Consequently, we benchmark speedups of up to 3.8x on A64FX with Float16. Adding a compensated time integration, speedups reach up to 3.6x. Although ShallowWaters.jl is simplified compared to large Earth-system models, it shares essential algorithms and therefore shows that 16-bit calculations are indeed a competitive way to accelerate Earth-system simulations on available hardware.
spellingShingle	Klöwer, M Hatfield, S Croci, M Düben, PD Palmer, TN Fluid simulations accelerated with 16 bits: Approaching 4x speedup on A64FX by squeezing ShallowWaters.jl into Float16
title	Fluid simulations accelerated with 16 bits: Approaching 4x speedup on A64FX by squeezing ShallowWaters.jl into Float16
title_full	Fluid simulations accelerated with 16 bits: Approaching 4x speedup on A64FX by squeezing ShallowWaters.jl into Float16
title_fullStr	Fluid simulations accelerated with 16 bits: Approaching 4x speedup on A64FX by squeezing ShallowWaters.jl into Float16
title_full_unstemmed	Fluid simulations accelerated with 16 bits: Approaching 4x speedup on A64FX by squeezing ShallowWaters.jl into Float16
title_short	Fluid simulations accelerated with 16 bits: Approaching 4x speedup on A64FX by squeezing ShallowWaters.jl into Float16
title_sort	fluid simulations accelerated with 16 bits approaching 4x speedup on a64fx by squeezing shallowwaters jl into float16
work_keys_str_mv	AT klowerm fluidsimulationsacceleratedwith16bitsapproaching4xspeedupona64fxbysqueezingshallowwatersjlintofloat16 AT hatfields fluidsimulationsacceleratedwith16bitsapproaching4xspeedupona64fxbysqueezingshallowwatersjlintofloat16 AT crocim fluidsimulationsacceleratedwith16bitsapproaching4xspeedupona64fxbysqueezingshallowwatersjlintofloat16 AT dubenpd fluidsimulationsacceleratedwith16bitsapproaching4xspeedupona64fxbysqueezingshallowwatersjlintofloat16 AT palmertn fluidsimulationsacceleratedwith16bitsapproaching4xspeedupona64fxbysqueezingshallowwatersjlintofloat16

Fluid simulations accelerated with 16 bits: Approaching 4x speedup on A64FX by squeezing ShallowWaters.jl into Float16

Similar Items