NIST published a Keccak Code Package following the SHA-3 workshop
organized in 2014, containing optimized versions of various Keccak
functions for various architectures. This CL converts the GNU asm
code of the Keccak permutation for the x86_64 architecture into Go
assembly.
The code here is almost an identical copy of KeccakF1600_StatePermute,
with the only modification of converting the input state into the
implementation's internal representation and vice versa before return.
This keeps the algorithm an in-place version and avoids requiring
extra external state inits and data XORs before and after the permute.
The speed difference is:
benchmark old ns/op new ns/op delta
BenchmarkPermutationFunction-8 476 411 -13.66%
BenchmarkSha3_512_MTU-8 9910 8681 -12.40%
BenchmarkSha3_384_MTU-8 7124 6249 -12.28%
BenchmarkSha3_256_MTU-8 5666 4986 -12.00%
BenchmarkSha3_224_MTU-8 5401 4750 -12.05%
BenchmarkShake128_MTU-8 4614 3980 -13.74%
BenchmarkShake256_MTU-8 4935 4295 -12.97%
BenchmarkShake256_16x-8 71850 63798 -11.21%
BenchmarkShake256_1MiB-8 3784244 3285733 -13.17%
BenchmarkSha3_512_1MiB-8 7098875 6163359 -13.18%
benchmark old MB/s new MB/s speedup
BenchmarkPermutationFunction-8 420.11 486.35 1.16x
BenchmarkSha3_512_MTU-8 136.22 155.51 1.14x
BenchmarkSha3_384_MTU-8 189.49 216.03 1.14x
BenchmarkSha3_256_MTU-8 238.23 270.71 1.14x
BenchmarkSha3_224_MTU-8 249.91 284.19 1.14x
BenchmarkShake128_MTU-8 292.58 339.15 1.16x
BenchmarkShake256_MTU-8 273.53 314.28 1.15x
BenchmarkShake256_16x-8 228.03 256.81 1.13x
BenchmarkShake256_1MiB-8 277.09 319.13 1.15x
BenchmarkSha3_512_1MiB-8 147.71 170.13 1.15x
For further details, please see:
- http://csrc.nist.gov/groups/ST/hash/sha-3/Aug2014/documents/vanassche_keccak_code.pdf
- https://github.com/gvanas/KeccakCodePackage
Change-Id: I5b0b9395bba7d8c9acfe2b9c79f6e9c2cf858c7c
Reviewed-on: https://go-review.googlesource.com/17962
Reviewed-by: Adam Langley <agl@golang.org>