Filippo Valsorda
2aa609cf4a
chacha20,poly1305,chacha20poly1305: set consistent build tags
...
appengine was only necessary for the legacy system based on Go 1.9, drop
that. Add purego tags instead. Remove redundant architecture tags.
Change-Id: Ib1f65a4837511e63e08c1aa43163a79cfe868e0c
Reviewed-on: https://go-review.googlesource.com/c/crypto/+/215498
Run-TryBot: Filippo Valsorda <filippo@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Katie Hockman <katie@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
2020-02-21 23:15:18 +00:00
Filippo Valsorda
61a87790db
poly1305: drop broken arm assembly
...
The ARM assembly uses the reserved G register. This started causing
frequent crashes due to async preemption, but it was already broken in
the presence of signals, including SIGPROF.
name old speed new speed delta
Chacha20Poly1305/Open-64 2.88MB/s ± 0% 1.85MB/s ± 0% -35.76% (p=0.008 n=6+7)
Chacha20Poly1305/Seal-64 3.17MB/s ± 1% 1.97MB/s ± 0% -37.78% (p=0.000 n=10+8)
Chacha20Poly1305/Open-64-X 2.41MB/s ± 0% 1.61MB/s ± 0% -33.29% (p=0.000 n=9+9)
Chacha20Poly1305/Seal-64-X 2.55MB/s ± 0% 1.64MB/s ± 0% -35.61% (p=0.000 n=10+9)
Chacha20Poly1305/Open-1350 8.43MB/s ± 0% 4.15MB/s ± 0% -50.78% (p=0.000 n=10+10)
Chacha20Poly1305/Seal-1350 8.55MB/s ± 0% 4.18MB/s ± 0% -51.12% (p=0.000 n=9+9)
Chacha20Poly1305/Open-1350-X 8.16MB/s ± 0% 4.06MB/s ± 0% -50.18% (p=0.000 n=10+10)
Chacha20Poly1305/Seal-1350-X 8.24MB/s ± 1% 4.08MB/s ± 1% -50.53% (p=0.000 n=10+10)
Chacha20Poly1305/Open-8192 9.73MB/s ± 1% 4.56MB/s ± 0% -53.15% (p=0.000 n=9+10)
Chacha20Poly1305/Seal-8192 9.57MB/s ± 0% 4.52MB/s ± 0% -52.77% (p=0.000 n=9+9)
Chacha20Poly1305/Open-8192-X 9.65MB/s ± 0% 4.54MB/s ± 0% -52.95% (p=0.000 n=10+7)
Chacha20Poly1305/Seal-8192-X 9.47MB/s ± 1% 4.50MB/s ± 0% -52.50% (p=0.000 n=10+9)
Fixes golang/go#35511
Change-Id: I5e5ca3a0499f04c5fece5bc669a417e32d2656c6
Reviewed-on: https://go-review.googlesource.com/c/crypto/+/213880
Run-TryBot: Filippo Valsorda <filippo@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-01-09 15:21:10 +00:00
Filippo Valsorda
2dbfe9001f
poly1305: rewrite the Go implementation with 64-bit limbs
...
The new code is meant to be readable without external references for
Poly1305, and explains the field logic. The generic code is now 30-50%
faster on a Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz, and even better
on a 3.1 GHz i7 MacBook.
name old time/op new time/op delta
64-48 126ns ± 0% 80ns ± 1% -36.24% (p=0.000 n=16+20)
1K-48 1.07µs ± 0% 0.81µs ± 2% -23.63% (p=0.000 n=19+20)
2M-48 2.07ms ± 0% 1.61ms ± 1% -22.31% (p=0.000 n=20+20)
Write64-48 79.3ns ± 0% 58.0ns ± 1% -26.89% (p=0.000 n=20+19)
Write1K-48 1.02µs ± 0% 0.79µs ± 1% -22.91% (p=0.000 n=19+19)
Write2M-48 2.07ms ± 0% 1.61ms ± 2% -22.33% (p=0.000 n=17+20)
name old speed new speed delta
64-48 508MB/s ± 0% 797MB/s ± 1% +56.95% (p=0.000 n=16+20)
1K-48 960MB/s ± 0% 1257MB/s ± 2% +30.94% (p=0.000 n=18+20)
2M-48 1.01GB/s ± 0% 1.30GB/s ± 1% +28.73% (p=0.000 n=20+20)
Write64-48 807MB/s ± 0% 1104MB/s ± 1% +36.78% (p=0.000 n=18+19)
Write1K-48 1.00GB/s ± 0% 1.30GB/s ± 1% +29.71% (p=0.000 n=18+19)
Write2M-48 1.01GB/s ± 0% 1.31GB/s ± 2% +28.77% (p=0.000 n=17+20)
The assembly is still 50-90% faster on the Xeon, 30-60% on the MacBook.
The Go code does not use all the arithmetic tricks the assembly does,
and it does not have access to the three operand wide shift instruction.
name old time/op new time/op delta
64-48 80.3ns ± 1% 54.2ns ± 0% -32.50% (p=0.000 n=20+17)
1K-48 815ns ± 2% 446ns ± 1% -45.27% (p=0.000 n=20+20)
2M-48 1.61ms ± 1% 0.86ms ± 0% -46.54% (p=0.000 n=20+17)
Write64-48 58.0ns ± 1% 34.0ns ± 0% -41.34% (p=0.000 n=19+20)
Write1K-48 790ns ± 1% 427ns ± 0% -45.92% (p=0.000 n=19+17)
Write2M-48 1.61ms ± 2% 0.86ms ± 0% -46.51% (p=0.000 n=20+20)
name old speed new speed delta
64-48 797MB/s ± 1% 1180MB/s ± 0% +48.09% (p=0.000 n=20+19)
1K-48 1.26GB/s ± 2% 2.30GB/s ± 1% +82.71% (p=0.000 n=20+20)
2M-48 1.30GB/s ± 1% 2.44GB/s ± 0% +87.04% (p=0.000 n=20+17)
Write64-48 1.10GB/s ± 1% 1.88GB/s ± 0% +70.52% (p=0.000 n=19+18)
Write1K-48 1.30GB/s ± 1% 2.40GB/s ± 0% +84.84% (p=0.000 n=19+18)
Write2M-48 1.31GB/s ± 2% 2.44GB/s ± 0% +86.93% (p=0.000 n=20+20)
Hopefully this will also avoid the need for an arm64 implementation.
Since now the Go and the amd64/ppc64le assembly use the same limb
schedule, drop the assembly initialize and finalize implementations,
and make the wrapper code match. It comes with a minor slowdown.
name old time/op new time/op delta
64-48 50.3ns ± 0% 54.2ns ± 0% +7.73% (p=0.000 n=20+17)
1K-48 441ns ± 0% 446ns ± 1% +1.10% (p=0.000 n=19+20)
2M-48 860µs ± 0% 859µs ± 0% ~ (p=0.178 n=19+17)
Write64-48 34.0ns ± 0% 34.0ns ± 0% ~ (all equal)
Write1K-48 424ns ± 0% 427ns ± 0% +0.71% (p=0.000 n=17+17)
Write2M-48 860µs ± 0% 859µs ± 0% -0.04% (p=0.000 n=19+20)
name old speed new speed delta
64-48 1.27GB/s ± 0% 1.18GB/s ± 0% -7.20% (p=0.000 n=20+19)
1K-48 2.32GB/s ± 0% 2.30GB/s ± 1% -1.07% (p=0.000 n=18+20)
2M-48 2.44GB/s ± 0% 2.44GB/s ± 0% ~ (p=0.173 n=19+17)
Write64-48 1.88GB/s ± 0% 1.88GB/s ± 0% +0.04% (p=0.000 n=19+18)
Write1K-48 2.41GB/s ± 0% 2.40GB/s ± 0% -0.67% (p=0.000 n=19+18)
Write2M-48 2.44GB/s ± 0% 2.44GB/s ± 0% +0.04% (p=0.000 n=19+20)
Since poly1305/sum_generic.go was almost entirely rewritten, it's
probably best reviewed on gitiles.
This is the implementation published at
https://blog.filippo.io/a-literate-go-implementation-of-poly1305/
Updates #31470
Change-Id: I74f9011d3ee317a43b05ae7f05d96081d08bffd3
Reviewed-on: https://go-review.googlesource.com/c/crypto/+/169037
Reviewed-by: Katie Hockman <katie@golang.org>
2019-11-11 21:33:42 +00:00
Lynn Boger
f99c8df09e
poly1305: improve performance with asm for ppc64le
...
This adds an asm implementation for poly1305 on ppc64le, based on
the amd64 asm implementation using the mac interface.
The improvements on a power8 based on the poly1305 benchmarks are:
name old time/op new time/op delta
64 172ns ± 0% 78ns ± 0% -54.77% (p=1.000 n=1+1)
1K 1.47µs ± 0% 0.59µs ± 0% -59.69% (p=1.000 n=1+1)
2M 2.84ms ± 0% 1.12ms ± 0% -60.47% (p=1.000 n=1+1)
64Unaligned 172ns ± 0% 78ns ± 0% -54.59% (p=1.000 n=1+1)
1KUnaligned 1.47µs ± 0% 0.59µs ± 0% -59.69% (p=1.000 n=1+1)
2MUnaligned 2.84ms ± 0% 1.13ms ± 0% -60.23% (p=1.000 n=1+1)
Write64 100ns ± 0% 46ns ± 0% -53.80% (p=1.000 n=1+1)
Write1K 1.40µs ± 0% 0.56µs ± 0% -59.90% (p=1.000 n=1+1)
Write2M 2.84ms ± 0% 1.12ms ± 0% -60.46% (p=1.000 n=1+1)
Write64Unaligned 100ns ± 0% 46ns ± 0% -53.60% (p=1.000 n=1+1)
Write1KUnaligned 1.40µs ± 0% 0.56µs ± 0% -59.90% (p=1.000 n=1+1)
Write2MUnaligned 2.84ms ± 0% 1.13ms ± 0% -60.22% (p=1.000 n=1+1)
Change-Id: I77cc9bb3645a6b1a6edc414b5651dc37ae4a7410
Reviewed-on: https://go-review.googlesource.com/c/crypto/+/173421
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Munday <mike.munday@ibm.com>
2019-06-05 12:30:33 +00:00
Andreas Auernhammer
c2843e01d9
poly1305: implement a subset of the hash.Hash interface
...
This CL adds the poly1305.MAC type which implements a
subset of the hash.Hash interface. With MAC it is possible
to compute an authentication tag of data without copying
it into a single byte slice.
This commit modifies the reference/generic and the
AMD64 assembler but not the ARM/s390x implementation
to support an io.Writer interface.
Updates golang/go#25219
Change-Id: I7ee5a9eadd43387cf3cd887d734c625575eee47d
Reviewed-on: https://go-review.googlesource.com/c/crypto/+/111335
Run-TryBot: Filippo Valsorda <filippo@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Filippo Valsorda <filippo@golang.org>
2019-03-08 22:17:18 +00:00
bill_ofarrell
4eb8c2c8d8
poly1305: add optimized s390x SIMD implementation with VMSL
...
SIMD implementation based the on the algorithm outlined in:
NEON crypto, Daniel J. Bernstein and Peter Schwabe
https://cryptojedi.org/papers/neoncrypto-20120320.pdf
and as modified for VMSL as described in
Accelerating Poly1305 Cryptographic Message Authentication on the z14
O'Farrell, Gadriwala, et al, CASCON 2017, p48-55
https://ibm.ent.box.com/s/jf9gedj0e9d2vjctfyh186shaztavnht
name old new delta
64 485MB/s 1315 MB/s +171.58%
1K 607MB/s 4352 MB/s +616.97%
64Unaligned 485MB/s 1373 MB/s +183.09%
1KUnaligned 606MB/s 4286 MB/s +607.26%
2M 607MB/s 5529 MB/s +810.87%
Change-Id: I31ccc25ced09180d99ea5c9233f0dcdc8666fc98
Reviewed-on: https://go-review.googlesource.com/110297
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Munday <mike.munday@ibm.com>
2018-05-14 22:55:51 +00:00