openssl

History

Daniel Hu 3f42f41ad1 Improve chacha20 perfomance on aarch64 by interleaving scalar with SVE/SVE2 The patch will process one extra block by scalar in addition to blocks by SVE/SVE2 in parallel. This is esp. helpful in the scenario where we only have 128-bit vector length. The actual uplift to performance is complicated, depending on the vector length and input data size. SVE/SVE2 implementation don't always perform better than Neon, but it should prevail in most cases On a CPU with 256-bit SVE/SVE2, interleaved processing can handle 9 blocks in parallel (8 blocks by SVE and 1 by Scalar). on 128-bit SVE/SVE2 it is 5 blocks. Input size that is a multiple of 9/5 blocks on respective CPU can be typically handled at maximum speed. Here are test data for 256-bit and 128-bit SVE/SVE2 by running "openssl speed -evp chacha20 -bytes 576" (and other size) ----------------------------------+--------------------------------- 256-bit SVE \| 128-bit SVE2 ----------------------------------\|--------------------------------- Input 576 bytes 512 bytes \| 320 bytes 256 bytes ----------------------------------\|--------------------------------- SVE 1716361.91k 1556699.18k \| 1615789.06k 1302864.40k ----------------------------------\|--------------------------------- Neon 1262643.44k 1509044.05k \| 680075.67k 1060532.31k ----------------------------------+--------------------------------- If the input size gets very large, the advantage of SVE/SVE2 over Neon will fade out. Signed-off-by: Daniel Hu <Daniel.Hu@arm.com> Change-Id: Ieedfcb767b9c08280d7c8c9a8648919c69728fab Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl/openssl/pull/18901)		2022-09-01 18:01:19 +10:00
..
asm	Improve chacha20 perfomance on aarch64 by interleaving scalar with SVE/SVE2	2022-09-01 18:01:19 +10:00
build.info	Generate the preprocessed .s files for chacha and poly 1305 on ia64	2022-05-27 08:10:49 +02:00
chacha_enc.c	Add ROTATE inline RISC-V zbb/zbkb asm for chacha	2022-07-13 18:15:12 +01:00
chacha_ppc.c	Update copyright year	2022-05-03 13:34:51 +01:00