Andreas Auernhammer
f671756e04
blake2b: fix AVX performance problems on amd64
...
On some amd64 CPUs (Xeon E5-2680v4 / E5-2620v3) using SSE and AVX instructions
leads to very low performance.
On a i7-6500U the SSE-AVX code performs following:
AVX2:
name time/op
Write128-4 165ns ± 0%
Write1K-4 1.20µs ± 0%
Sum128-4 189ns ± 1%
Sum1K-4 1.22µs ± 0%
name speed
Write128-4 773MB/s ± 1%
Write1K-4 855MB/s ± 0%
Sum128-4 675MB/s ± 1%
Sum1K-4 838MB/s ± 0%
while the same code achieves values < 65MB/s on a Xeon E5-2620v3.
Replacing the `MOVQ` and `PINSRQ` with the AVX instructions `VMOVQ` and `VPINSRQ`
increases the performance of the AVX/AVX2 code to some expected values:
name old time/op new time/op delta
Write128-12 2.20µs ±10% 0.22µs ± 9% -90.00% (p=0.029 n=4+4)
Write1K-12 16.2µs ± 0% 1.1µs ± 0% -93.07% (p=0.029 n=4+4)
Sum128-12 2.10µs ± 0% 0.22µs ± 0% -89.47% (p=0.029 n=4+4)
Sum1K-12 16.3µs ± 0% 1.2µs ± 0% -92.65% (p=0.029 n=4+4)
name old speed new speed delta
Write128-12 58.5MB/s ±10% 582.8MB/s ±10% +897.08% (p=0.029 n=4+4)
Write1K-12 63.1MB/s ± 0% 909.8MB/s ± 0% +1341.40% (p=0.029 n=4+4)
Sum128-12 60.8MB/s ± 0% 576.3MB/s ± 0% +847.84% (p=0.029 n=4+4)
Sum1K-12 62.8MB/s ± 0% 855.2MB/s ± 0% +1260.78% (p=0.029 n=4+4)
The AVX/AVX2 code now uses only AVX (no SSE) instructions.
Fixes golang/go#18563 .
Change-Id: I1961dd8fa02014642587523b7f099816a263c9f5
Reviewed-on: https://go-review.googlesource.com/34993
Reviewed-by: Adam Langley <agl@golang.org>
2017-02-08 19:53:58 +00:00
Mikio Hara
f6b343c37c
blake2b: fix build on non-amd64 platforms
...
Change-Id: Ib9ebb1a2eff4b61f60453086be5c63ac7af1f7fc
Reviewed-on: https://go-review.googlesource.com/34672
Run-TryBot: Mikio Hara <mikioh.mikioh@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@golang.org>
2016-12-21 23:57:47 +00:00
Andreas Auernhammer
d8e61c69ab
blake2b: add AVX assembly
...
Add an AVX implementation and improve SSE4.1 assembly.
AVX vs SSE4.1
name old time/op new time/op delta
Write128-8 249ns ± 0% 220ns ± 0% -11.85% (p=0.029 n=4+4)
Write1K-8 1.68µs ± 1% 1.56µs ± 1% -6.71% (p=0.029 n=4+4)
Write32K-8 52.6µs ± 0% 48.7µs ± 0% -7.40% (p=0.029 n=4+4)
Sum128-8 264ns ± 0% 241ns ± 1% -8.52% (p=0.029 n=4+4)
Sum1K-8 1.70µs ± 0% 1.57µs ± 0% -7.79% (p=0.029 n=4+4)
Sum32K-8 54.1µs ± 3% 49.5µs ± 1% -8.36% (p=0.029 n=4+4)
name old speed new speed delta
Write128-8 513MB/s ± 0% 582MB/s ± 0% +13.38% (p=0.029 n=4+4)
Write1K-8 610MB/s ± 1% 654MB/s ± 1% +7.22% (p=0.029 n=4+4)
Write32K-8 622MB/s ± 0% 672MB/s ± 0% +7.99% (p=0.029 n=4+4)
Sum128-8 484MB/s ± 1% 529MB/s ± 0% +9.21% (p=0.029 n=4+4)
Sum1K-8 602MB/s ± 0% 653MB/s ± 0% +8.42% (p=0.029 n=4+4)
Sum32K-8 607MB/s ± 3% 662MB/s ± 1% +9.03% (p=0.029 n=4+4)
AVX2 vs AVX
name old time/op new time/op delta
Write128-4 192ns ± 0% 166ns ± 0% -14.03% (p=0.029 n=4+4)
Write1K-4 1.37µs ± 0% 1.19µs ± 0% -12.65% (p=0.029 n=4+4)
Write32K-4 42.5µs ± 0% 37.3µs ± 0% -12.33% (p=0.029 n=4+4)
Sum128-4 213ns ± 0% 188ns ± 0% -11.97% (p=0.029 n=4+4)
Sum1K-4 1.40µs ± 0% 1.22µs ± 0% -12.85% (p=0.029 n=4+4)
Sum32K-4 42.8µs ± 0% 37.3µs ± 0% -12.94% (p=0.029 n=4+4)
name old speed new speed delta
Write128-4 662MB/s ± 0% 771MB/s ± 0% +16.47% (p=0.029 n=4+4)
Write1K-4 748MB/s ± 0% 857MB/s ± 0% +14.49% (p=0.029 n=4+4)
Write32K-4 771MB/s ± 0% 879MB/s ± 0% +14.07% (p=0.029 n=4+4)
Sum128-4 600MB/s ± 0% 680MB/s ± 0% +13.49% (p=0.029 n=4+4)
Sum1K-4 733MB/s ± 0% 841MB/s ± 0% +14.72% (p=0.029 n=4+4)
Sum32K-4 765MB/s ± 0% 879MB/s ± 0% +14.85% (p=0.029 n=4+4)
Change-Id: Idf85742e952c07b76c0c7fb5404ed9b0caf0f6eb
Reviewed-on: https://go-review.googlesource.com/34319
Reviewed-by: Adam Langley <agl@golang.org>
2016-12-20 18:11:55 +00:00
Austin Clements
e67f5eca87
blake2b: use proper Go frame sizes
...
Similar to the previous commit, blake2b's assembly routines claim they
have a zero byte frame and manually subtract a frame from the SP,
which can fail to grow the stack when necessary, leading to memory
corruption.
Fix this by using the correct stack frame sizes so the generated stack
growth prologue is correct, and aligning the SP up instead of down.
Change-Id: Ic426338c45c94a2c01d549860c2295a0ee9200be
Reviewed-on: https://go-review.googlesource.com/31585
Reviewed-by: Adam Langley <agl@golang.org>
Reviewed-by: Andreas Auernhammer <aead@mail.de>
2016-10-21 19:51:14 +00:00
Andreas Auernhammer
9e9c7d4ed3
blake2b: new package
...
Add the hash function BLAKE2b defined in RFC 7693.
On amd64/AVX2
name time/op
Write128-4 164ns ± 1%
Write1K-4 1.19µs ± 2%
Sum128-4 187ns ± 1%
Sum1K-4 1.21µs ± 1%
name speed
Write128-4 800MB/s ± 1%
Write1K-4 856MB/s ± 2%
Sum128-4 682MB/s ± 1%
Sum1K-4 842MB/s ± 1%
On amd64/SSE4
name time/op
Write128-4 192ns ± 1%
Write1K-4 1.37µs ± 0%
Sum128-4 215ns ± 1%
Sum1K-4 1.39µs ± 1%
name speed
Write128-4 665MB/s ± 1%
Write1K-4 746MB/s ± 0%
Sum128-4 592MB/s ± 1%
Sum1K-4 735MB/s ± 1%
Change-Id: I0b41ae136980b0e8a970be330e8cc5f02b9e6818
Reviewed-on: https://go-review.googlesource.com/30918
Reviewed-by: Adam Langley <agl@golang.org>
2016-10-18 17:05:21 +00:00