Nibbling MAYO: Optimized Implementations for AVX2 and Cortex-M4
MAYO is a popular high-calorie condiment as well as an auspicious candidate in the ongoing NIST competition for additional post-quantum signature schemes achieving competitive signature and public key sizes. In this work, we present high-speed implementations of MAYO using the AVX2 and Armv7E-M ins...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Ruhr-Universität Bochum
2024-03-01
|
Series: | Transactions on Cryptographic Hardware and Embedded Systems |
Subjects: | |
Online Access: | https://tches.iacr.org/index.php/TCHES/article/view/11427 |
_version_ | 1797262230871343104 |
---|---|
author | Ward Beullens Fabio Campos Sofía Celi Basil Hess Matthias J. Kannwischer |
author_facet | Ward Beullens Fabio Campos Sofía Celi Basil Hess Matthias J. Kannwischer |
author_sort | Ward Beullens |
collection | DOAJ |
description |
MAYO is a popular high-calorie condiment as well as an auspicious candidate in the ongoing NIST competition for additional post-quantum signature schemes achieving competitive signature and public key sizes. In this work, we present high-speed implementations of MAYO using the AVX2 and Armv7E-M instruction sets targeting recent x86 platforms and the Arm Cortex-M4. Moreover, the main contribution of our work is showing that MAYO can be even faster when switching from a bitsliced representation of keys to a nibble-sliced representation. While the bitsliced representation was primarily motivated by faster arithmetic on microcontrollers, we show that it is not necessary for achieving high performance on Cortex-M4. On Cortex-M4, we instead propose to implement the large matrix multiplications of MAYO using the Method of the Four Russians (M4R), which allows us to achieve better performance than when using the bitsliced approach. This results in up to 21% faster signing. For AVX2, the change in representation allows us to implement the arithmetic much faster using shuffle instructions. Signing takes up to 3.2x fewer cycles and key generation and verification enjoy similar speedups. This shows that MAYO is competitive with lattice-based signature schemes on x86 CPUs, and a factor of 2-6 slower than lattice-based signature schemes on Cortex-M4 (which can still be considered competitive).
|
first_indexed | 2024-04-24T23:53:49Z |
format | Article |
id | doaj.art-33c277bf61a74fc4a5f503520d1aaf31 |
institution | Directory Open Access Journal |
issn | 2569-2925 |
language | English |
last_indexed | 2024-04-24T23:53:49Z |
publishDate | 2024-03-01 |
publisher | Ruhr-Universität Bochum |
record_format | Article |
series | Transactions on Cryptographic Hardware and Embedded Systems |
spelling | doaj.art-33c277bf61a74fc4a5f503520d1aaf312024-03-14T16:24:48ZengRuhr-Universität BochumTransactions on Cryptographic Hardware and Embedded Systems2569-29252024-03-012024210.46586/tches.v2024.i2.252-275Nibbling MAYO: Optimized Implementations for AVX2 and Cortex-M4Ward Beullens0Fabio Campos1Sofía Celi2Basil Hess3Matthias J. Kannwischer4IBM Research Europe, Zurich, SwitzerlandRheinMain University of Applied Sciences, Wiesbaden, GermanyBrave Software, San Francisco, CaliforniaIBM Research Europe, Zurich, SwitzerlandQuantum Safe Migration Center, Chelpis Quantum Tech, Taipei, Taiwan MAYO is a popular high-calorie condiment as well as an auspicious candidate in the ongoing NIST competition for additional post-quantum signature schemes achieving competitive signature and public key sizes. In this work, we present high-speed implementations of MAYO using the AVX2 and Armv7E-M instruction sets targeting recent x86 platforms and the Arm Cortex-M4. Moreover, the main contribution of our work is showing that MAYO can be even faster when switching from a bitsliced representation of keys to a nibble-sliced representation. While the bitsliced representation was primarily motivated by faster arithmetic on microcontrollers, we show that it is not necessary for achieving high performance on Cortex-M4. On Cortex-M4, we instead propose to implement the large matrix multiplications of MAYO using the Method of the Four Russians (M4R), which allows us to achieve better performance than when using the bitsliced approach. This results in up to 21% faster signing. For AVX2, the change in representation allows us to implement the arithmetic much faster using shuffle instructions. Signing takes up to 3.2x fewer cycles and key generation and verification enjoy similar speedups. This shows that MAYO is competitive with lattice-based signature schemes on x86 CPUs, and a factor of 2-6 slower than lattice-based signature schemes on Cortex-M4 (which can still be considered competitive). https://tches.iacr.org/index.php/TCHES/article/view/11427MAYOOil and VinegarArm Cortex-M4AVX2NIST PQC |
spellingShingle | Ward Beullens Fabio Campos Sofía Celi Basil Hess Matthias J. Kannwischer Nibbling MAYO: Optimized Implementations for AVX2 and Cortex-M4 Transactions on Cryptographic Hardware and Embedded Systems MAYO Oil and Vinegar Arm Cortex-M4 AVX2 NIST PQC |
title | Nibbling MAYO: Optimized Implementations for AVX2 and Cortex-M4 |
title_full | Nibbling MAYO: Optimized Implementations for AVX2 and Cortex-M4 |
title_fullStr | Nibbling MAYO: Optimized Implementations for AVX2 and Cortex-M4 |
title_full_unstemmed | Nibbling MAYO: Optimized Implementations for AVX2 and Cortex-M4 |
title_short | Nibbling MAYO: Optimized Implementations for AVX2 and Cortex-M4 |
title_sort | nibbling mayo optimized implementations for avx2 and cortex m4 |
topic | MAYO Oil and Vinegar Arm Cortex-M4 AVX2 NIST PQC |
url | https://tches.iacr.org/index.php/TCHES/article/view/11427 |
work_keys_str_mv | AT wardbeullens nibblingmayooptimizedimplementationsforavx2andcortexm4 AT fabiocampos nibblingmayooptimizedimplementationsforavx2andcortexm4 AT sofiaceli nibblingmayooptimizedimplementationsforavx2andcortexm4 AT basilhess nibblingmayooptimizedimplementationsforavx2andcortexm4 AT matthiasjkannwischer nibblingmayooptimizedimplementationsforavx2andcortexm4 |