Nibbling MAYO: Optimized Implementations for AVX2 and Cortex-M4

MAYO is a popular high-calorie condiment as well as an auspicious candidate in the ongoing NIST competition for additional post-quantum signature schemes achieving competitive signature and public key sizes. In this work, we present high-speed implementations of MAYO using the AVX2 and Armv7E-M ins...

Full description

Bibliographic Details
Main Authors: Ward Beullens, Fabio Campos, Sofía Celi, Basil Hess, Matthias J. Kannwischer
Format: Article
Language:English
Published: Ruhr-Universität Bochum 2024-03-01
Series:Transactions on Cryptographic Hardware and Embedded Systems
Subjects:
Online Access:https://tches.iacr.org/index.php/TCHES/article/view/11427
_version_ 1797262230871343104
author Ward Beullens
Fabio Campos
Sofía Celi
Basil Hess
Matthias J. Kannwischer
author_facet Ward Beullens
Fabio Campos
Sofía Celi
Basil Hess
Matthias J. Kannwischer
author_sort Ward Beullens
collection DOAJ
description MAYO is a popular high-calorie condiment as well as an auspicious candidate in the ongoing NIST competition for additional post-quantum signature schemes achieving competitive signature and public key sizes. In this work, we present high-speed implementations of MAYO using the AVX2 and Armv7E-M instruction sets targeting recent x86 platforms and the Arm Cortex-M4. Moreover, the main contribution of our work is showing that MAYO can be even faster when switching from a bitsliced representation of keys to a nibble-sliced representation. While the bitsliced representation was primarily motivated by faster arithmetic on microcontrollers, we show that it is not necessary for achieving high performance on Cortex-M4. On Cortex-M4, we instead propose to implement the large matrix multiplications of MAYO using the Method of the Four Russians (M4R), which allows us to achieve better performance than when using the bitsliced approach. This results in up to 21% faster signing. For AVX2, the change in representation allows us to implement the arithmetic much faster using shuffle instructions. Signing takes up to 3.2x fewer cycles and key generation and verification enjoy similar speedups. This shows that MAYO is competitive with lattice-based signature schemes on x86 CPUs, and a factor of 2-6 slower than lattice-based signature schemes on Cortex-M4 (which can still be considered competitive).
first_indexed 2024-04-24T23:53:49Z
format Article
id doaj.art-33c277bf61a74fc4a5f503520d1aaf31
institution Directory Open Access Journal
issn 2569-2925
language English
last_indexed 2024-04-24T23:53:49Z
publishDate 2024-03-01
publisher Ruhr-Universität Bochum
record_format Article
series Transactions on Cryptographic Hardware and Embedded Systems
spelling doaj.art-33c277bf61a74fc4a5f503520d1aaf312024-03-14T16:24:48ZengRuhr-Universität BochumTransactions on Cryptographic Hardware and Embedded Systems2569-29252024-03-012024210.46586/tches.v2024.i2.252-275Nibbling MAYO: Optimized Implementations for AVX2 and Cortex-M4Ward Beullens0Fabio Campos1Sofía Celi2Basil Hess3Matthias J. Kannwischer4IBM Research Europe, Zurich, SwitzerlandRheinMain University of Applied Sciences, Wiesbaden, GermanyBrave Software, San Francisco, CaliforniaIBM Research Europe, Zurich, SwitzerlandQuantum Safe Migration Center, Chelpis Quantum Tech, Taipei, Taiwan MAYO is a popular high-calorie condiment as well as an auspicious candidate in the ongoing NIST competition for additional post-quantum signature schemes achieving competitive signature and public key sizes. In this work, we present high-speed implementations of MAYO using the AVX2 and Armv7E-M instruction sets targeting recent x86 platforms and the Arm Cortex-M4. Moreover, the main contribution of our work is showing that MAYO can be even faster when switching from a bitsliced representation of keys to a nibble-sliced representation. While the bitsliced representation was primarily motivated by faster arithmetic on microcontrollers, we show that it is not necessary for achieving high performance on Cortex-M4. On Cortex-M4, we instead propose to implement the large matrix multiplications of MAYO using the Method of the Four Russians (M4R), which allows us to achieve better performance than when using the bitsliced approach. This results in up to 21% faster signing. For AVX2, the change in representation allows us to implement the arithmetic much faster using shuffle instructions. Signing takes up to 3.2x fewer cycles and key generation and verification enjoy similar speedups. This shows that MAYO is competitive with lattice-based signature schemes on x86 CPUs, and a factor of 2-6 slower than lattice-based signature schemes on Cortex-M4 (which can still be considered competitive). https://tches.iacr.org/index.php/TCHES/article/view/11427MAYOOil and VinegarArm Cortex-M4AVX2NIST PQC
spellingShingle Ward Beullens
Fabio Campos
Sofía Celi
Basil Hess
Matthias J. Kannwischer
Nibbling MAYO: Optimized Implementations for AVX2 and Cortex-M4
Transactions on Cryptographic Hardware and Embedded Systems
MAYO
Oil and Vinegar
Arm Cortex-M4
AVX2
NIST PQC
title Nibbling MAYO: Optimized Implementations for AVX2 and Cortex-M4
title_full Nibbling MAYO: Optimized Implementations for AVX2 and Cortex-M4
title_fullStr Nibbling MAYO: Optimized Implementations for AVX2 and Cortex-M4
title_full_unstemmed Nibbling MAYO: Optimized Implementations for AVX2 and Cortex-M4
title_short Nibbling MAYO: Optimized Implementations for AVX2 and Cortex-M4
title_sort nibbling mayo optimized implementations for avx2 and cortex m4
topic MAYO
Oil and Vinegar
Arm Cortex-M4
AVX2
NIST PQC
url https://tches.iacr.org/index.php/TCHES/article/view/11427
work_keys_str_mv AT wardbeullens nibblingmayooptimizedimplementationsforavx2andcortexm4
AT fabiocampos nibblingmayooptimizedimplementationsforavx2andcortexm4
AT sofiaceli nibblingmayooptimizedimplementationsforavx2andcortexm4
AT basilhess nibblingmayooptimizedimplementationsforavx2andcortexm4
AT matthiasjkannwischer nibblingmayooptimizedimplementationsforavx2andcortexm4