Fix performance regression of ChaCha20 on LoongArch64

The regression was introduced in PR #22817.

In that pull request, the input length check was moved forward,
but the related ori instruction was missing, and it will cause
input of any length down to the much slower scalar implementation.

Fixes #23300

CLA: trivial

Reviewed-by: Shane Lontis <shane.lontis@oracle.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/23301)
This commit is contained in:
Lin Runze 2024-01-14 20:21:49 +08:00 committed by Tomas Mraz
parent 2f85736e9c
commit 971028535e
1 changed files with 1 additions and 2 deletions

View File

@ -71,6 +71,7 @@ ChaCha20_ctr32:
# $a4 = arg #5 (counter array)
beqz $len,.Lno_data
ori $t3,$zero,64
la.pcrel $t0,OPENSSL_loongarch_hwcap_P
ld.w $t0,$t0,0
@ -461,7 +462,6 @@ EOF
$code .= <<EOF;
.align 6
.LChaCha20_4x:
ori $t3,$zero,64
addi.d $sp,$sp,-128
# Save the initial block counter in $t4
@ -886,7 +886,6 @@ EOF
$code .= <<EOF;
.align 6
.LChaCha20_8x:
ori $t3,$zero,64
addi.d $sp,$sp,-128
# Save the initial block counter in $t4