crypto

Граф коммитов

Автор	SHA1	Сообщение	Дата
Filippo Valsorda	2aa609cf4a	chacha20,poly1305,chacha20poly1305: set consistent build tags appengine was only necessary for the legacy system based on Go 1.9, drop that. Add purego tags instead. Remove redundant architecture tags. Change-Id: Ib1f65a4837511e63e08c1aa43163a79cfe868e0c Reviewed-on: https://go-review.googlesource.com/c/crypto/+/215498 Run-TryBot: Filippo Valsorda <filippo@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Katie Hockman <katie@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>	2020-02-21 23:15:18 +00:00
Filippo Valsorda	61a87790db	poly1305: drop broken arm assembly The ARM assembly uses the reserved G register. This started causing frequent crashes due to async preemption, but it was already broken in the presence of signals, including SIGPROF. name old speed new speed delta Chacha20Poly1305/Open-64 2.88MB/s ± 0% 1.85MB/s ± 0% -35.76% (p=0.008 n=6+7) Chacha20Poly1305/Seal-64 3.17MB/s ± 1% 1.97MB/s ± 0% -37.78% (p=0.000 n=10+8) Chacha20Poly1305/Open-64-X 2.41MB/s ± 0% 1.61MB/s ± 0% -33.29% (p=0.000 n=9+9) Chacha20Poly1305/Seal-64-X 2.55MB/s ± 0% 1.64MB/s ± 0% -35.61% (p=0.000 n=10+9) Chacha20Poly1305/Open-1350 8.43MB/s ± 0% 4.15MB/s ± 0% -50.78% (p=0.000 n=10+10) Chacha20Poly1305/Seal-1350 8.55MB/s ± 0% 4.18MB/s ± 0% -51.12% (p=0.000 n=9+9) Chacha20Poly1305/Open-1350-X 8.16MB/s ± 0% 4.06MB/s ± 0% -50.18% (p=0.000 n=10+10) Chacha20Poly1305/Seal-1350-X 8.24MB/s ± 1% 4.08MB/s ± 1% -50.53% (p=0.000 n=10+10) Chacha20Poly1305/Open-8192 9.73MB/s ± 1% 4.56MB/s ± 0% -53.15% (p=0.000 n=9+10) Chacha20Poly1305/Seal-8192 9.57MB/s ± 0% 4.52MB/s ± 0% -52.77% (p=0.000 n=9+9) Chacha20Poly1305/Open-8192-X 9.65MB/s ± 0% 4.54MB/s ± 0% -52.95% (p=0.000 n=10+7) Chacha20Poly1305/Seal-8192-X 9.47MB/s ± 1% 4.50MB/s ± 0% -52.50% (p=0.000 n=10+9) Fixes golang/go#35511 Change-Id: I5e5ca3a0499f04c5fece5bc669a417e32d2656c6 Reviewed-on: https://go-review.googlesource.com/c/crypto/+/213880 Run-TryBot: Filippo Valsorda <filippo@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2020-01-09 15:21:10 +00:00
Filippo Valsorda	2dbfe9001f	poly1305: rewrite the Go implementation with 64-bit limbs The new code is meant to be readable without external references for Poly1305, and explains the field logic. The generic code is now 30-50% faster on a Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz, and even better on a 3.1 GHz i7 MacBook. name old time/op new time/op delta 64-48 126ns ± 0% 80ns ± 1% -36.24% (p=0.000 n=16+20) 1K-48 1.07µs ± 0% 0.81µs ± 2% -23.63% (p=0.000 n=19+20) 2M-48 2.07ms ± 0% 1.61ms ± 1% -22.31% (p=0.000 n=20+20) Write64-48 79.3ns ± 0% 58.0ns ± 1% -26.89% (p=0.000 n=20+19) Write1K-48 1.02µs ± 0% 0.79µs ± 1% -22.91% (p=0.000 n=19+19) Write2M-48 2.07ms ± 0% 1.61ms ± 2% -22.33% (p=0.000 n=17+20) name old speed new speed delta 64-48 508MB/s ± 0% 797MB/s ± 1% +56.95% (p=0.000 n=16+20) 1K-48 960MB/s ± 0% 1257MB/s ± 2% +30.94% (p=0.000 n=18+20) 2M-48 1.01GB/s ± 0% 1.30GB/s ± 1% +28.73% (p=0.000 n=20+20) Write64-48 807MB/s ± 0% 1104MB/s ± 1% +36.78% (p=0.000 n=18+19) Write1K-48 1.00GB/s ± 0% 1.30GB/s ± 1% +29.71% (p=0.000 n=18+19) Write2M-48 1.01GB/s ± 0% 1.31GB/s ± 2% +28.77% (p=0.000 n=17+20) The assembly is still 50-90% faster on the Xeon, 30-60% on the MacBook. The Go code does not use all the arithmetic tricks the assembly does, and it does not have access to the three operand wide shift instruction. name old time/op new time/op delta 64-48 80.3ns ± 1% 54.2ns ± 0% -32.50% (p=0.000 n=20+17) 1K-48 815ns ± 2% 446ns ± 1% -45.27% (p=0.000 n=20+20) 2M-48 1.61ms ± 1% 0.86ms ± 0% -46.54% (p=0.000 n=20+17) Write64-48 58.0ns ± 1% 34.0ns ± 0% -41.34% (p=0.000 n=19+20) Write1K-48 790ns ± 1% 427ns ± 0% -45.92% (p=0.000 n=19+17) Write2M-48 1.61ms ± 2% 0.86ms ± 0% -46.51% (p=0.000 n=20+20) name old speed new speed delta 64-48 797MB/s ± 1% 1180MB/s ± 0% +48.09% (p=0.000 n=20+19) 1K-48 1.26GB/s ± 2% 2.30GB/s ± 1% +82.71% (p=0.000 n=20+20) 2M-48 1.30GB/s ± 1% 2.44GB/s ± 0% +87.04% (p=0.000 n=20+17) Write64-48 1.10GB/s ± 1% 1.88GB/s ± 0% +70.52% (p=0.000 n=19+18) Write1K-48 1.30GB/s ± 1% 2.40GB/s ± 0% +84.84% (p=0.000 n=19+18) Write2M-48 1.31GB/s ± 2% 2.44GB/s ± 0% +86.93% (p=0.000 n=20+20) Hopefully this will also avoid the need for an arm64 implementation. Since now the Go and the amd64/ppc64le assembly use the same limb schedule, drop the assembly initialize and finalize implementations, and make the wrapper code match. It comes with a minor slowdown. name old time/op new time/op delta 64-48 50.3ns ± 0% 54.2ns ± 0% +7.73% (p=0.000 n=20+17) 1K-48 441ns ± 0% 446ns ± 1% +1.10% (p=0.000 n=19+20) 2M-48 860µs ± 0% 859µs ± 0% ~ (p=0.178 n=19+17) Write64-48 34.0ns ± 0% 34.0ns ± 0% ~ (all equal) Write1K-48 424ns ± 0% 427ns ± 0% +0.71% (p=0.000 n=17+17) Write2M-48 860µs ± 0% 859µs ± 0% -0.04% (p=0.000 n=19+20) name old speed new speed delta 64-48 1.27GB/s ± 0% 1.18GB/s ± 0% -7.20% (p=0.000 n=20+19) 1K-48 2.32GB/s ± 0% 2.30GB/s ± 1% -1.07% (p=0.000 n=18+20) 2M-48 2.44GB/s ± 0% 2.44GB/s ± 0% ~ (p=0.173 n=19+17) Write64-48 1.88GB/s ± 0% 1.88GB/s ± 0% +0.04% (p=0.000 n=19+18) Write1K-48 2.41GB/s ± 0% 2.40GB/s ± 0% -0.67% (p=0.000 n=19+18) Write2M-48 2.44GB/s ± 0% 2.44GB/s ± 0% +0.04% (p=0.000 n=19+20) Since poly1305/sum_generic.go was almost entirely rewritten, it's probably best reviewed on gitiles. This is the implementation published at https://blog.filippo.io/a-literate-go-implementation-of-poly1305/ Updates #31470 Change-Id: I74f9011d3ee317a43b05ae7f05d96081d08bffd3 Reviewed-on: https://go-review.googlesource.com/c/crypto/+/169037 Reviewed-by: Katie Hockman <katie@golang.org>	2019-11-11 21:33:42 +00:00
Lynn Boger	f99c8df09e	poly1305: improve performance with asm for ppc64le This adds an asm implementation for poly1305 on ppc64le, based on the amd64 asm implementation using the mac interface. The improvements on a power8 based on the poly1305 benchmarks are: name old time/op new time/op delta 64 172ns ± 0% 78ns ± 0% -54.77% (p=1.000 n=1+1) 1K 1.47µs ± 0% 0.59µs ± 0% -59.69% (p=1.000 n=1+1) 2M 2.84ms ± 0% 1.12ms ± 0% -60.47% (p=1.000 n=1+1) 64Unaligned 172ns ± 0% 78ns ± 0% -54.59% (p=1.000 n=1+1) 1KUnaligned 1.47µs ± 0% 0.59µs ± 0% -59.69% (p=1.000 n=1+1) 2MUnaligned 2.84ms ± 0% 1.13ms ± 0% -60.23% (p=1.000 n=1+1) Write64 100ns ± 0% 46ns ± 0% -53.80% (p=1.000 n=1+1) Write1K 1.40µs ± 0% 0.56µs ± 0% -59.90% (p=1.000 n=1+1) Write2M 2.84ms ± 0% 1.12ms ± 0% -60.46% (p=1.000 n=1+1) Write64Unaligned 100ns ± 0% 46ns ± 0% -53.60% (p=1.000 n=1+1) Write1KUnaligned 1.40µs ± 0% 0.56µs ± 0% -59.90% (p=1.000 n=1+1) Write2MUnaligned 2.84ms ± 0% 1.13ms ± 0% -60.22% (p=1.000 n=1+1) Change-Id: I77cc9bb3645a6b1a6edc414b5651dc37ae4a7410 Reviewed-on: https://go-review.googlesource.com/c/crypto/+/173421 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Munday <mike.munday@ibm.com>	2019-06-05 12:30:33 +00:00
Andreas Auernhammer	c2843e01d9	poly1305: implement a subset of the hash.Hash interface This CL adds the poly1305.MAC type which implements a subset of the hash.Hash interface. With MAC it is possible to compute an authentication tag of data without copying it into a single byte slice. This commit modifies the reference/generic and the AMD64 assembler but not the ARM/s390x implementation to support an io.Writer interface. Updates golang/go#25219 Change-Id: I7ee5a9eadd43387cf3cd887d734c625575eee47d Reviewed-on: https://go-review.googlesource.com/c/crypto/+/111335 Run-TryBot: Filippo Valsorda <filippo@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Filippo Valsorda <filippo@golang.org>	2019-03-08 22:17:18 +00:00
bill_ofarrell	4eb8c2c8d8	poly1305: add optimized s390x SIMD implementation with VMSL SIMD implementation based the on the algorithm outlined in: NEON crypto, Daniel J. Bernstein and Peter Schwabe https://cryptojedi.org/papers/neoncrypto-20120320.pdf and as modified for VMSL as described in Accelerating Poly1305 Cryptographic Message Authentication on the z14 O'Farrell, Gadriwala, et al, CASCON 2017, p48-55 https://ibm.ent.box.com/s/jf9gedj0e9d2vjctfyh186shaztavnht name old new delta 64 485MB/s 1315 MB/s +171.58% 1K 607MB/s 4352 MB/s +616.97% 64Unaligned 485MB/s 1373 MB/s +183.09% 1KUnaligned 606MB/s 4286 MB/s +607.26% 2M 607MB/s 5529 MB/s +810.87% Change-Id: I31ccc25ced09180d99ea5c9233f0dcdc8666fc98 Reviewed-on: https://go-review.googlesource.com/110297 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Munday <mike.munday@ibm.com>	2018-05-14 22:55:51 +00:00

6 Коммитов