Simplify the constant swap function.
On amd64: Replace the CMOVQEQ scheme with SSE2 code similar to the non-amd64 code.
On non-amd64: Avoid unnecessary loop iterations.
The result is less and slightly faster code.
name old time/op new time/op delta
ScalarBaseMult-4 653µs ± 0% 636µs ± 0% ~ (p=0.100 n=3+3)
name old time/op new time/op delta
ConstantSwap-4 10.4ns ± 1% 6.2ns ± 0% -39.86% (p=0.029 n=4+4)
On an i7-65000U
Change-Id: Ia5eea92e0b3eabb6c291d25229aa582b51278552
Reviewed-on: https://go-review.googlesource.com/39693
Reviewed-by: Adam Langley <agl@golang.org>
Run-TryBot: Adam Langley <agl@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This consists of ~2000 lines of amd64 assembly and a, much slower,
generic Go version in curve25519.go. The assembly has been ported from
djb's public domain sources and the only semantic alterations are to
deal with Go's split stacks.
R=rsc
CC=golang-dev
https://golang.org/cl/5786045