Граф коммитов

5 Коммитов

Автор SHA1 Сообщение Дата
Andreas Auernhammer f671756e04 blake2b: fix AVX performance problems on amd64
On some amd64 CPUs (Xeon E5-2680v4 / E5-2620v3) using SSE and AVX instructions
leads to very low performance.
On a i7-6500U the SSE-AVX code performs following:

AVX2:
name        time/op
Write128-4    165ns ± 0%
Write1K-4    1.20µs ± 0%
Sum128-4      189ns ± 1%
Sum1K-4      1.22µs ± 0%

name        speed
Write128-4  773MB/s ± 1%
Write1K-4   855MB/s ± 0%
Sum128-4    675MB/s ± 1%
Sum1K-4     838MB/s ± 0%

while the same code achieves values < 65MB/s on a Xeon E5-2620v3.

Replacing the `MOVQ` and `PINSRQ` with the AVX instructions `VMOVQ` and `VPINSRQ`
increases the performance of the AVX/AVX2 code to some expected values:

name         old time/op    new time/op     delta
Write128-12    2.20µs ±10%     0.22µs ± 9%    -90.00%  (p=0.029 n=4+4)
Write1K-12     16.2µs ± 0%      1.1µs ± 0%    -93.07%  (p=0.029 n=4+4)
Sum128-12      2.10µs ± 0%     0.22µs ± 0%    -89.47%  (p=0.029 n=4+4)
Sum1K-12       16.3µs ± 0%      1.2µs ± 0%    -92.65%  (p=0.029 n=4+4)

name         old speed      new speed       delta
Write128-12  58.5MB/s ±10%  582.8MB/s ±10%   +897.08%  (p=0.029 n=4+4)
Write1K-12   63.1MB/s ± 0%  909.8MB/s ± 0%  +1341.40%  (p=0.029 n=4+4)
Sum128-12    60.8MB/s ± 0%  576.3MB/s ± 0%   +847.84%  (p=0.029 n=4+4)
Sum1K-12     62.8MB/s ± 0%  855.2MB/s ± 0%  +1260.78%  (p=0.029 n=4+4)

The AVX/AVX2 code now uses only AVX (no SSE) instructions.

Fixes golang/go#18563.

Change-Id: I1961dd8fa02014642587523b7f099816a263c9f5
Reviewed-on: https://go-review.googlesource.com/34993
Reviewed-by: Adam Langley <agl@golang.org>
2017-02-08 19:53:58 +00:00
Mikio Hara f6b343c37c blake2b: fix build on non-amd64 platforms
Change-Id: Ib9ebb1a2eff4b61f60453086be5c63ac7af1f7fc
Reviewed-on: https://go-review.googlesource.com/34672
Run-TryBot: Mikio Hara <mikioh.mikioh@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@golang.org>
2016-12-21 23:57:47 +00:00
Andreas Auernhammer d8e61c69ab blake2b: add AVX assembly
Add an AVX implementation and improve SSE4.1 assembly.

AVX vs SSE4.1
name 		old time/op 		new time/op 	delta
Write128-8 	249ns ± 0% 		220ns ± 0% 	-11.85% (p=0.029 n=4+4)
Write1K-8 	1.68µs ± 1% 		1.56µs ± 1% 	-6.71% (p=0.029 n=4+4)
Write32K-8 	52.6µs ± 0% 		48.7µs ± 0% 	-7.40% (p=0.029 n=4+4)
Sum128-8 	264ns ± 0% 		241ns ± 1% 	-8.52% (p=0.029 n=4+4)
Sum1K-8 	1.70µs ± 0% 		1.57µs ± 0% 	-7.79% (p=0.029 n=4+4)
Sum32K-8 	54.1µs ± 3% 		49.5µs ± 1% 	-8.36% (p=0.029 n=4+4)

name 		old speed 		new speed	 delta
Write128-8 	513MB/s ± 0% 		582MB/s ± 0% 	+13.38% (p=0.029 n=4+4)
Write1K-8 	610MB/s ± 1% 		654MB/s ± 1% 	+7.22% (p=0.029 n=4+4)
Write32K-8 	622MB/s ± 0% 		672MB/s ± 0% 	+7.99% (p=0.029 n=4+4)
Sum128-8 	484MB/s ± 1% 		529MB/s ± 0% 	+9.21% (p=0.029 n=4+4)
Sum1K-8 	602MB/s ± 0% 		653MB/s ± 0% 	+8.42% (p=0.029 n=4+4)
Sum32K-8 	607MB/s ± 3% 		662MB/s ± 1% 	+9.03% (p=0.029 n=4+4)

AVX2 vs AVX
name 		old time/op 		new time/op 	delta
Write128-4 	192ns ± 0% 		166ns ± 0% 	-14.03% (p=0.029 n=4+4)
Write1K-4 	1.37µs ± 0% 		1.19µs ± 0% 	-12.65% (p=0.029 n=4+4)
Write32K-4 	42.5µs ± 0% 		37.3µs ± 0% 	-12.33% (p=0.029 n=4+4)
Sum128-4 	213ns ± 0% 		188ns ± 0% 	-11.97% (p=0.029 n=4+4)
Sum1K-4 	1.40µs ± 0% 		1.22µs ± 0% 	-12.85% (p=0.029 n=4+4)
Sum32K-4 	42.8µs ± 0% 		37.3µs ± 0% 	-12.94% (p=0.029 n=4+4)

name 		old speed 		new speed 	delta
Write128-4 	662MB/s ± 0% 		771MB/s ± 0% 	+16.47% (p=0.029 n=4+4)
Write1K-4 	748MB/s ± 0% 		857MB/s ± 0% 	+14.49% (p=0.029 n=4+4)
Write32K-4 	771MB/s ± 0% 		879MB/s ± 0% 	+14.07% (p=0.029 n=4+4)
Sum128-4 	600MB/s ± 0% 		680MB/s ± 0% 	+13.49% (p=0.029 n=4+4)
Sum1K-4 	733MB/s ± 0% 		841MB/s ± 0% 	+14.72% (p=0.029 n=4+4)
Sum32K-4    	765MB/s ± 0%  		879MB/s ± 0%  	+14.85% (p=0.029 n=4+4)

Change-Id: Idf85742e952c07b76c0c7fb5404ed9b0caf0f6eb
Reviewed-on: https://go-review.googlesource.com/34319
Reviewed-by: Adam Langley <agl@golang.org>
2016-12-20 18:11:55 +00:00
Austin Clements e67f5eca87 blake2b: use proper Go frame sizes
Similar to the previous commit, blake2b's assembly routines claim they
have a zero byte frame and manually subtract a frame from the SP,
which can fail to grow the stack when necessary, leading to memory
corruption.

Fix this by using the correct stack frame sizes so the generated stack
growth prologue is correct, and aligning the SP up instead of down.

Change-Id: Ic426338c45c94a2c01d549860c2295a0ee9200be
Reviewed-on: https://go-review.googlesource.com/31585
Reviewed-by: Adam Langley <agl@golang.org>
Reviewed-by: Andreas Auernhammer <aead@mail.de>
2016-10-21 19:51:14 +00:00
Andreas Auernhammer 9e9c7d4ed3 blake2b: new package
Add the hash function BLAKE2b defined in RFC 7693.

On amd64/AVX2
name 		time/op
Write128-4 	164ns ± 1%
Write1K-4 	1.19µs ± 2%
Sum128-4 	187ns ± 1%
Sum1K-4 	1.21µs ± 1%

name 		speed
Write128-4 	800MB/s ± 1%
Write1K-4 	856MB/s ± 2%
Sum128-4 	682MB/s ± 1%
Sum1K-4     	842MB/s ± 1%

On amd64/SSE4
name 		time/op
Write128-4 	192ns ± 1%
Write1K-4 	1.37µs ± 0%
Sum128-4 	215ns ± 1%
Sum1K-4 	1.39µs ± 1%

name 		speed
Write128-4 	665MB/s ± 1%
Write1K-4 	746MB/s ± 0%
Sum128-4 	592MB/s ± 1%
Sum1K-4 	735MB/s ± 1%

Change-Id: I0b41ae136980b0e8a970be330e8cc5f02b9e6818
Reviewed-on: https://go-review.googlesource.com/30918
Reviewed-by: Adam Langley <agl@golang.org>
2016-10-18 17:05:21 +00:00