crypto/chacha20/chacha_test.go

275 строки
8.3 KiB
Go
Исходник Постоянная ссылка Обычный вид История

// Copyright 2016 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package chacha20
import (
chacha20: expose internal/chacha20 package const KeySize = 32 const NonceSize = 12 func HChaCha20(key, nonce []byte) ([]byte, error) type Cipher struct {} func NewUnauthenticatedCipher(key, nonce []byte) (*Cipher, error) func (s *Cipher) XORKeyStream(dst, src []byte) Small performance hit in chacha20poly1305, probably due to the loss of the Advance API, which we might consider adding later. No new allocations, thanks to the mid-stack inliner. name old time/op new time/op delta Chacha20Poly1305/Open-64-8 1.60µs ± 0% 1.68µs ± 1% +4.94% (p=0.000 n=9+10) Chacha20Poly1305/Seal-64-8 1.56µs ± 0% 1.64µs ± 1% +5.21% (p=0.000 n=8+10) Chacha20Poly1305/Open-64-X-8 2.10µs ± 1% 2.22µs ± 1% +5.81% (p=0.000 n=10+10) Chacha20Poly1305/Seal-64-X-8 2.07µs ± 1% 2.17µs ± 0% +4.88% (p=0.000 n=10+10) Chacha20Poly1305/Open-1350-8 15.4µs ± 0% 15.7µs ± 1% +1.65% (p=0.000 n=10+10) Chacha20Poly1305/Seal-1350-8 15.6µs ± 2% 15.9µs ± 1% +1.58% (p=0.028 n=10+9) Chacha20Poly1305/Open-1350-X-8 16.0µs ± 1% 16.3µs ± 2% +2.00% (p=0.000 n=10+10) Chacha20Poly1305/Seal-1350-X-8 15.9µs ± 0% 16.3µs ± 1% +1.91% (p=0.000 n=10+8) Chacha20Poly1305/Open-8192-8 85.6µs ± 0% 86.6µs ± 1% +1.21% (p=0.000 n=10+10) Chacha20Poly1305/Seal-8192-8 85.7µs ± 0% 86.3µs ± 0% +0.68% (p=0.001 n=9+9) Chacha20Poly1305/Open-8192-X-8 86.4µs ± 1% 87.1µs ± 1% +0.76% (p=0.035 n=10+9) Chacha20Poly1305/Seal-8192-X-8 86.0µs ± 0% 87.0µs ± 1% +1.14% (p=0.000 n=9+9) Updates golang/go#24485 Change-Id: I2ec2ef487a03f013049915d9063751c75a78408b Reviewed-on: https://go-review.googlesource.com/c/crypto/+/185980 Reviewed-by: Michael Munday <mike.munday@ibm.com>
2019-07-12 19:29:02 +03:00
"bytes"
"encoding/hex"
chacha20: expose internal/chacha20 package const KeySize = 32 const NonceSize = 12 func HChaCha20(key, nonce []byte) ([]byte, error) type Cipher struct {} func NewUnauthenticatedCipher(key, nonce []byte) (*Cipher, error) func (s *Cipher) XORKeyStream(dst, src []byte) Small performance hit in chacha20poly1305, probably due to the loss of the Advance API, which we might consider adding later. No new allocations, thanks to the mid-stack inliner. name old time/op new time/op delta Chacha20Poly1305/Open-64-8 1.60µs ± 0% 1.68µs ± 1% +4.94% (p=0.000 n=9+10) Chacha20Poly1305/Seal-64-8 1.56µs ± 0% 1.64µs ± 1% +5.21% (p=0.000 n=8+10) Chacha20Poly1305/Open-64-X-8 2.10µs ± 1% 2.22µs ± 1% +5.81% (p=0.000 n=10+10) Chacha20Poly1305/Seal-64-X-8 2.07µs ± 1% 2.17µs ± 0% +4.88% (p=0.000 n=10+10) Chacha20Poly1305/Open-1350-8 15.4µs ± 0% 15.7µs ± 1% +1.65% (p=0.000 n=10+10) Chacha20Poly1305/Seal-1350-8 15.6µs ± 2% 15.9µs ± 1% +1.58% (p=0.028 n=10+9) Chacha20Poly1305/Open-1350-X-8 16.0µs ± 1% 16.3µs ± 2% +2.00% (p=0.000 n=10+10) Chacha20Poly1305/Seal-1350-X-8 15.9µs ± 0% 16.3µs ± 1% +1.91% (p=0.000 n=10+8) Chacha20Poly1305/Open-8192-8 85.6µs ± 0% 86.6µs ± 1% +1.21% (p=0.000 n=10+10) Chacha20Poly1305/Seal-8192-8 85.7µs ± 0% 86.3µs ± 0% +0.68% (p=0.001 n=9+9) Chacha20Poly1305/Open-8192-X-8 86.4µs ± 1% 87.1µs ± 1% +0.76% (p=0.035 n=10+9) Chacha20Poly1305/Seal-8192-X-8 86.0µs ± 0% 87.0µs ± 1% +1.14% (p=0.000 n=9+9) Updates golang/go#24485 Change-Id: I2ec2ef487a03f013049915d9063751c75a78408b Reviewed-on: https://go-review.googlesource.com/c/crypto/+/185980 Reviewed-by: Michael Munday <mike.munday@ibm.com>
2019-07-12 19:29:02 +03:00
"fmt"
internal/chacha20: implement the cipher.Stream interface and optimize SIMD implementations of ChaCha20 (such as CL 35842) interleave block computations in order to achieve high performance. This means that they produce more than 64 bytes of output at a time. Unfortunately when encrypting small amounts of data (such as Poly1305 keys) the current interface to ChaCha20 forces the additional encrypted blocks of output to be discarded and recomputed later since it does not maintain any state. This additional overhead slows down the encryption of small amounts of data when using such optimized code. This CL makes the generic ChaCha20 implementation stateful, caching key, nonce and counter values and buffering any unused key stream bytes. ChaCha20 now also implements the high level cipher.Stream interface which makes the API more consistent with other stream ciphers in the standard library's crypto package. This will make it easier to add high performance SIMD implementations in the future. In addition to modifying the API I have also added some optimizations to improve the performance of the generic implementation. Note that the performance will improve further on amd64 with Go 1.11 due to CL 95475 (binary.LittleEndian.PutUint32 optimization). These benchmarks are based on Go 1.10.1. name old speed new speed delta ChaCha20/32 174MB/s ± 2% 174MB/s ± 1% ~ (p=0.796 n=10+10) ChaCha20/63 309MB/s ± 1% 337MB/s ± 2% +9.32% (p=0.000 n=10+9) ChaCha20/64 299MB/s ± 2% 350MB/s ± 1% +17.12% (p=0.000 n=9+8) ChaCha20/256 297MB/s ± 2% 390MB/s ± 1% +31.40% (p=0.000 n=10+10) ChaCha20/1024 300MB/s ± 0% 400MB/s ± 3% +33.38% (p=0.000 n=7+10) ChaCha20/1350 290MB/s ± 1% 386MB/s ± 2% +33.10% (p=0.000 n=9+10) ChaCha20/65536 301MB/s ± 1% 416MB/s ± 2% +38.25% (p=0.000 n=9+10) ChaCha20-Poly1305 (AEAD optimizations manually disabled): name old speed new speed delta Chacha20Poly1305Open_64 122MB/s ± 7% 131MB/s ± 2% +7.23% (p=0.000 n=18+18) Chacha20Poly1305Seal_64 125MB/s ± 4% 137MB/s ± 2% +9.88% (p=0.000 n=20+19) Chacha20Poly1305Open_1350 244MB/s ± 4% 305MB/s ± 3% +25.04% (p=0.000 n=20+19) Chacha20Poly1305Seal_1350 242MB/s ± 3% 309MB/s ± 2% +27.56% (p=0.000 n=20+19) Chacha20Poly1305Open_8K 260MB/s ± 7% 338MB/s ± 3% +29.96% (p=0.000 n=20+19) Chacha20Poly1305Seal_8K 262MB/s ± 5% 335MB/s ± 4% +27.80% (p=0.000 n=20+19) No change in allocations for either set of benchmarks. Change-Id: I28ca7947904e9d79debe2d5aac6623526fe5e595 Reviewed-on: https://go-review.googlesource.com/104856 Run-TryBot: Michael Munday <mike.munday@ibm.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-18 16:32:01 +03:00
"math/rand"
"testing"
)
func _() {
// Assert that bufSize is a multiple of blockSize.
var b [1]byte
_ = b[bufSize%blockSize]
}
chacha20: expose internal/chacha20 package const KeySize = 32 const NonceSize = 12 func HChaCha20(key, nonce []byte) ([]byte, error) type Cipher struct {} func NewUnauthenticatedCipher(key, nonce []byte) (*Cipher, error) func (s *Cipher) XORKeyStream(dst, src []byte) Small performance hit in chacha20poly1305, probably due to the loss of the Advance API, which we might consider adding later. No new allocations, thanks to the mid-stack inliner. name old time/op new time/op delta Chacha20Poly1305/Open-64-8 1.60µs ± 0% 1.68µs ± 1% +4.94% (p=0.000 n=9+10) Chacha20Poly1305/Seal-64-8 1.56µs ± 0% 1.64µs ± 1% +5.21% (p=0.000 n=8+10) Chacha20Poly1305/Open-64-X-8 2.10µs ± 1% 2.22µs ± 1% +5.81% (p=0.000 n=10+10) Chacha20Poly1305/Seal-64-X-8 2.07µs ± 1% 2.17µs ± 0% +4.88% (p=0.000 n=10+10) Chacha20Poly1305/Open-1350-8 15.4µs ± 0% 15.7µs ± 1% +1.65% (p=0.000 n=10+10) Chacha20Poly1305/Seal-1350-8 15.6µs ± 2% 15.9µs ± 1% +1.58% (p=0.028 n=10+9) Chacha20Poly1305/Open-1350-X-8 16.0µs ± 1% 16.3µs ± 2% +2.00% (p=0.000 n=10+10) Chacha20Poly1305/Seal-1350-X-8 15.9µs ± 0% 16.3µs ± 1% +1.91% (p=0.000 n=10+8) Chacha20Poly1305/Open-8192-8 85.6µs ± 0% 86.6µs ± 1% +1.21% (p=0.000 n=10+10) Chacha20Poly1305/Seal-8192-8 85.7µs ± 0% 86.3µs ± 0% +0.68% (p=0.001 n=9+9) Chacha20Poly1305/Open-8192-X-8 86.4µs ± 1% 87.1µs ± 1% +0.76% (p=0.035 n=10+9) Chacha20Poly1305/Seal-8192-X-8 86.0µs ± 0% 87.0µs ± 1% +1.14% (p=0.000 n=9+9) Updates golang/go#24485 Change-Id: I2ec2ef487a03f013049915d9063751c75a78408b Reviewed-on: https://go-review.googlesource.com/c/crypto/+/185980 Reviewed-by: Michael Munday <mike.munday@ibm.com>
2019-07-12 19:29:02 +03:00
func hexDecode(s string) []byte {
ss, err := hex.DecodeString(s)
if err != nil {
panic(fmt.Sprintf("cannot decode input %#v: %v", s, err))
}
chacha20: expose internal/chacha20 package const KeySize = 32 const NonceSize = 12 func HChaCha20(key, nonce []byte) ([]byte, error) type Cipher struct {} func NewUnauthenticatedCipher(key, nonce []byte) (*Cipher, error) func (s *Cipher) XORKeyStream(dst, src []byte) Small performance hit in chacha20poly1305, probably due to the loss of the Advance API, which we might consider adding later. No new allocations, thanks to the mid-stack inliner. name old time/op new time/op delta Chacha20Poly1305/Open-64-8 1.60µs ± 0% 1.68µs ± 1% +4.94% (p=0.000 n=9+10) Chacha20Poly1305/Seal-64-8 1.56µs ± 0% 1.64µs ± 1% +5.21% (p=0.000 n=8+10) Chacha20Poly1305/Open-64-X-8 2.10µs ± 1% 2.22µs ± 1% +5.81% (p=0.000 n=10+10) Chacha20Poly1305/Seal-64-X-8 2.07µs ± 1% 2.17µs ± 0% +4.88% (p=0.000 n=10+10) Chacha20Poly1305/Open-1350-8 15.4µs ± 0% 15.7µs ± 1% +1.65% (p=0.000 n=10+10) Chacha20Poly1305/Seal-1350-8 15.6µs ± 2% 15.9µs ± 1% +1.58% (p=0.028 n=10+9) Chacha20Poly1305/Open-1350-X-8 16.0µs ± 1% 16.3µs ± 2% +2.00% (p=0.000 n=10+10) Chacha20Poly1305/Seal-1350-X-8 15.9µs ± 0% 16.3µs ± 1% +1.91% (p=0.000 n=10+8) Chacha20Poly1305/Open-8192-8 85.6µs ± 0% 86.6µs ± 1% +1.21% (p=0.000 n=10+10) Chacha20Poly1305/Seal-8192-8 85.7µs ± 0% 86.3µs ± 0% +0.68% (p=0.001 n=9+9) Chacha20Poly1305/Open-8192-X-8 86.4µs ± 1% 87.1µs ± 1% +0.76% (p=0.035 n=10+9) Chacha20Poly1305/Seal-8192-X-8 86.0µs ± 0% 87.0µs ± 1% +1.14% (p=0.000 n=9+9) Updates golang/go#24485 Change-Id: I2ec2ef487a03f013049915d9063751c75a78408b Reviewed-on: https://go-review.googlesource.com/c/crypto/+/185980 Reviewed-by: Michael Munday <mike.munday@ibm.com>
2019-07-12 19:29:02 +03:00
return ss
}
internal/chacha20: implement the cipher.Stream interface and optimize SIMD implementations of ChaCha20 (such as CL 35842) interleave block computations in order to achieve high performance. This means that they produce more than 64 bytes of output at a time. Unfortunately when encrypting small amounts of data (such as Poly1305 keys) the current interface to ChaCha20 forces the additional encrypted blocks of output to be discarded and recomputed later since it does not maintain any state. This additional overhead slows down the encryption of small amounts of data when using such optimized code. This CL makes the generic ChaCha20 implementation stateful, caching key, nonce and counter values and buffering any unused key stream bytes. ChaCha20 now also implements the high level cipher.Stream interface which makes the API more consistent with other stream ciphers in the standard library's crypto package. This will make it easier to add high performance SIMD implementations in the future. In addition to modifying the API I have also added some optimizations to improve the performance of the generic implementation. Note that the performance will improve further on amd64 with Go 1.11 due to CL 95475 (binary.LittleEndian.PutUint32 optimization). These benchmarks are based on Go 1.10.1. name old speed new speed delta ChaCha20/32 174MB/s ± 2% 174MB/s ± 1% ~ (p=0.796 n=10+10) ChaCha20/63 309MB/s ± 1% 337MB/s ± 2% +9.32% (p=0.000 n=10+9) ChaCha20/64 299MB/s ± 2% 350MB/s ± 1% +17.12% (p=0.000 n=9+8) ChaCha20/256 297MB/s ± 2% 390MB/s ± 1% +31.40% (p=0.000 n=10+10) ChaCha20/1024 300MB/s ± 0% 400MB/s ± 3% +33.38% (p=0.000 n=7+10) ChaCha20/1350 290MB/s ± 1% 386MB/s ± 2% +33.10% (p=0.000 n=9+10) ChaCha20/65536 301MB/s ± 1% 416MB/s ± 2% +38.25% (p=0.000 n=9+10) ChaCha20-Poly1305 (AEAD optimizations manually disabled): name old speed new speed delta Chacha20Poly1305Open_64 122MB/s ± 7% 131MB/s ± 2% +7.23% (p=0.000 n=18+18) Chacha20Poly1305Seal_64 125MB/s ± 4% 137MB/s ± 2% +9.88% (p=0.000 n=20+19) Chacha20Poly1305Open_1350 244MB/s ± 4% 305MB/s ± 3% +25.04% (p=0.000 n=20+19) Chacha20Poly1305Seal_1350 242MB/s ± 3% 309MB/s ± 2% +27.56% (p=0.000 n=20+19) Chacha20Poly1305Open_8K 260MB/s ± 7% 338MB/s ± 3% +29.96% (p=0.000 n=20+19) Chacha20Poly1305Seal_8K 262MB/s ± 5% 335MB/s ± 4% +27.80% (p=0.000 n=20+19) No change in allocations for either set of benchmarks. Change-Id: I28ca7947904e9d79debe2d5aac6623526fe5e595 Reviewed-on: https://go-review.googlesource.com/104856 Run-TryBot: Michael Munday <mike.munday@ibm.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-18 16:32:01 +03:00
// Run the test cases with the input and output in different buffers.
func TestNoOverlap(t *testing.T) {
for _, c := range testVectors {
chacha20: expose internal/chacha20 package const KeySize = 32 const NonceSize = 12 func HChaCha20(key, nonce []byte) ([]byte, error) type Cipher struct {} func NewUnauthenticatedCipher(key, nonce []byte) (*Cipher, error) func (s *Cipher) XORKeyStream(dst, src []byte) Small performance hit in chacha20poly1305, probably due to the loss of the Advance API, which we might consider adding later. No new allocations, thanks to the mid-stack inliner. name old time/op new time/op delta Chacha20Poly1305/Open-64-8 1.60µs ± 0% 1.68µs ± 1% +4.94% (p=0.000 n=9+10) Chacha20Poly1305/Seal-64-8 1.56µs ± 0% 1.64µs ± 1% +5.21% (p=0.000 n=8+10) Chacha20Poly1305/Open-64-X-8 2.10µs ± 1% 2.22µs ± 1% +5.81% (p=0.000 n=10+10) Chacha20Poly1305/Seal-64-X-8 2.07µs ± 1% 2.17µs ± 0% +4.88% (p=0.000 n=10+10) Chacha20Poly1305/Open-1350-8 15.4µs ± 0% 15.7µs ± 1% +1.65% (p=0.000 n=10+10) Chacha20Poly1305/Seal-1350-8 15.6µs ± 2% 15.9µs ± 1% +1.58% (p=0.028 n=10+9) Chacha20Poly1305/Open-1350-X-8 16.0µs ± 1% 16.3µs ± 2% +2.00% (p=0.000 n=10+10) Chacha20Poly1305/Seal-1350-X-8 15.9µs ± 0% 16.3µs ± 1% +1.91% (p=0.000 n=10+8) Chacha20Poly1305/Open-8192-8 85.6µs ± 0% 86.6µs ± 1% +1.21% (p=0.000 n=10+10) Chacha20Poly1305/Seal-8192-8 85.7µs ± 0% 86.3µs ± 0% +0.68% (p=0.001 n=9+9) Chacha20Poly1305/Open-8192-X-8 86.4µs ± 1% 87.1µs ± 1% +0.76% (p=0.035 n=10+9) Chacha20Poly1305/Seal-8192-X-8 86.0µs ± 0% 87.0µs ± 1% +1.14% (p=0.000 n=9+9) Updates golang/go#24485 Change-Id: I2ec2ef487a03f013049915d9063751c75a78408b Reviewed-on: https://go-review.googlesource.com/c/crypto/+/185980 Reviewed-by: Michael Munday <mike.munday@ibm.com>
2019-07-12 19:29:02 +03:00
s, _ := NewUnauthenticatedCipher(hexDecode(c.key), hexDecode(c.nonce))
input := hexDecode(c.input)
output := make([]byte, len(input))
internal/chacha20: implement the cipher.Stream interface and optimize SIMD implementations of ChaCha20 (such as CL 35842) interleave block computations in order to achieve high performance. This means that they produce more than 64 bytes of output at a time. Unfortunately when encrypting small amounts of data (such as Poly1305 keys) the current interface to ChaCha20 forces the additional encrypted blocks of output to be discarded and recomputed later since it does not maintain any state. This additional overhead slows down the encryption of small amounts of data when using such optimized code. This CL makes the generic ChaCha20 implementation stateful, caching key, nonce and counter values and buffering any unused key stream bytes. ChaCha20 now also implements the high level cipher.Stream interface which makes the API more consistent with other stream ciphers in the standard library's crypto package. This will make it easier to add high performance SIMD implementations in the future. In addition to modifying the API I have also added some optimizations to improve the performance of the generic implementation. Note that the performance will improve further on amd64 with Go 1.11 due to CL 95475 (binary.LittleEndian.PutUint32 optimization). These benchmarks are based on Go 1.10.1. name old speed new speed delta ChaCha20/32 174MB/s ± 2% 174MB/s ± 1% ~ (p=0.796 n=10+10) ChaCha20/63 309MB/s ± 1% 337MB/s ± 2% +9.32% (p=0.000 n=10+9) ChaCha20/64 299MB/s ± 2% 350MB/s ± 1% +17.12% (p=0.000 n=9+8) ChaCha20/256 297MB/s ± 2% 390MB/s ± 1% +31.40% (p=0.000 n=10+10) ChaCha20/1024 300MB/s ± 0% 400MB/s ± 3% +33.38% (p=0.000 n=7+10) ChaCha20/1350 290MB/s ± 1% 386MB/s ± 2% +33.10% (p=0.000 n=9+10) ChaCha20/65536 301MB/s ± 1% 416MB/s ± 2% +38.25% (p=0.000 n=9+10) ChaCha20-Poly1305 (AEAD optimizations manually disabled): name old speed new speed delta Chacha20Poly1305Open_64 122MB/s ± 7% 131MB/s ± 2% +7.23% (p=0.000 n=18+18) Chacha20Poly1305Seal_64 125MB/s ± 4% 137MB/s ± 2% +9.88% (p=0.000 n=20+19) Chacha20Poly1305Open_1350 244MB/s ± 4% 305MB/s ± 3% +25.04% (p=0.000 n=20+19) Chacha20Poly1305Seal_1350 242MB/s ± 3% 309MB/s ± 2% +27.56% (p=0.000 n=20+19) Chacha20Poly1305Open_8K 260MB/s ± 7% 338MB/s ± 3% +29.96% (p=0.000 n=20+19) Chacha20Poly1305Seal_8K 262MB/s ± 5% 335MB/s ± 4% +27.80% (p=0.000 n=20+19) No change in allocations for either set of benchmarks. Change-Id: I28ca7947904e9d79debe2d5aac6623526fe5e595 Reviewed-on: https://go-review.googlesource.com/104856 Run-TryBot: Michael Munday <mike.munday@ibm.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-18 16:32:01 +03:00
s.XORKeyStream(output, input)
got := hex.EncodeToString(output)
if got != c.output {
t.Errorf("length=%v: got %#v, want %#v", len(input), got, c.output)
internal/chacha20: implement the cipher.Stream interface and optimize SIMD implementations of ChaCha20 (such as CL 35842) interleave block computations in order to achieve high performance. This means that they produce more than 64 bytes of output at a time. Unfortunately when encrypting small amounts of data (such as Poly1305 keys) the current interface to ChaCha20 forces the additional encrypted blocks of output to be discarded and recomputed later since it does not maintain any state. This additional overhead slows down the encryption of small amounts of data when using such optimized code. This CL makes the generic ChaCha20 implementation stateful, caching key, nonce and counter values and buffering any unused key stream bytes. ChaCha20 now also implements the high level cipher.Stream interface which makes the API more consistent with other stream ciphers in the standard library's crypto package. This will make it easier to add high performance SIMD implementations in the future. In addition to modifying the API I have also added some optimizations to improve the performance of the generic implementation. Note that the performance will improve further on amd64 with Go 1.11 due to CL 95475 (binary.LittleEndian.PutUint32 optimization). These benchmarks are based on Go 1.10.1. name old speed new speed delta ChaCha20/32 174MB/s ± 2% 174MB/s ± 1% ~ (p=0.796 n=10+10) ChaCha20/63 309MB/s ± 1% 337MB/s ± 2% +9.32% (p=0.000 n=10+9) ChaCha20/64 299MB/s ± 2% 350MB/s ± 1% +17.12% (p=0.000 n=9+8) ChaCha20/256 297MB/s ± 2% 390MB/s ± 1% +31.40% (p=0.000 n=10+10) ChaCha20/1024 300MB/s ± 0% 400MB/s ± 3% +33.38% (p=0.000 n=7+10) ChaCha20/1350 290MB/s ± 1% 386MB/s ± 2% +33.10% (p=0.000 n=9+10) ChaCha20/65536 301MB/s ± 1% 416MB/s ± 2% +38.25% (p=0.000 n=9+10) ChaCha20-Poly1305 (AEAD optimizations manually disabled): name old speed new speed delta Chacha20Poly1305Open_64 122MB/s ± 7% 131MB/s ± 2% +7.23% (p=0.000 n=18+18) Chacha20Poly1305Seal_64 125MB/s ± 4% 137MB/s ± 2% +9.88% (p=0.000 n=20+19) Chacha20Poly1305Open_1350 244MB/s ± 4% 305MB/s ± 3% +25.04% (p=0.000 n=20+19) Chacha20Poly1305Seal_1350 242MB/s ± 3% 309MB/s ± 2% +27.56% (p=0.000 n=20+19) Chacha20Poly1305Open_8K 260MB/s ± 7% 338MB/s ± 3% +29.96% (p=0.000 n=20+19) Chacha20Poly1305Seal_8K 262MB/s ± 5% 335MB/s ± 4% +27.80% (p=0.000 n=20+19) No change in allocations for either set of benchmarks. Change-Id: I28ca7947904e9d79debe2d5aac6623526fe5e595 Reviewed-on: https://go-review.googlesource.com/104856 Run-TryBot: Michael Munday <mike.munday@ibm.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-18 16:32:01 +03:00
}
}
}
// Run the test cases with the input and output overlapping entirely.
func TestOverlap(t *testing.T) {
for _, c := range testVectors {
chacha20: expose internal/chacha20 package const KeySize = 32 const NonceSize = 12 func HChaCha20(key, nonce []byte) ([]byte, error) type Cipher struct {} func NewUnauthenticatedCipher(key, nonce []byte) (*Cipher, error) func (s *Cipher) XORKeyStream(dst, src []byte) Small performance hit in chacha20poly1305, probably due to the loss of the Advance API, which we might consider adding later. No new allocations, thanks to the mid-stack inliner. name old time/op new time/op delta Chacha20Poly1305/Open-64-8 1.60µs ± 0% 1.68µs ± 1% +4.94% (p=0.000 n=9+10) Chacha20Poly1305/Seal-64-8 1.56µs ± 0% 1.64µs ± 1% +5.21% (p=0.000 n=8+10) Chacha20Poly1305/Open-64-X-8 2.10µs ± 1% 2.22µs ± 1% +5.81% (p=0.000 n=10+10) Chacha20Poly1305/Seal-64-X-8 2.07µs ± 1% 2.17µs ± 0% +4.88% (p=0.000 n=10+10) Chacha20Poly1305/Open-1350-8 15.4µs ± 0% 15.7µs ± 1% +1.65% (p=0.000 n=10+10) Chacha20Poly1305/Seal-1350-8 15.6µs ± 2% 15.9µs ± 1% +1.58% (p=0.028 n=10+9) Chacha20Poly1305/Open-1350-X-8 16.0µs ± 1% 16.3µs ± 2% +2.00% (p=0.000 n=10+10) Chacha20Poly1305/Seal-1350-X-8 15.9µs ± 0% 16.3µs ± 1% +1.91% (p=0.000 n=10+8) Chacha20Poly1305/Open-8192-8 85.6µs ± 0% 86.6µs ± 1% +1.21% (p=0.000 n=10+10) Chacha20Poly1305/Seal-8192-8 85.7µs ± 0% 86.3µs ± 0% +0.68% (p=0.001 n=9+9) Chacha20Poly1305/Open-8192-X-8 86.4µs ± 1% 87.1µs ± 1% +0.76% (p=0.035 n=10+9) Chacha20Poly1305/Seal-8192-X-8 86.0µs ± 0% 87.0µs ± 1% +1.14% (p=0.000 n=9+9) Updates golang/go#24485 Change-Id: I2ec2ef487a03f013049915d9063751c75a78408b Reviewed-on: https://go-review.googlesource.com/c/crypto/+/185980 Reviewed-by: Michael Munday <mike.munday@ibm.com>
2019-07-12 19:29:02 +03:00
s, _ := NewUnauthenticatedCipher(hexDecode(c.key), hexDecode(c.nonce))
data := hexDecode(c.input)
internal/chacha20: implement the cipher.Stream interface and optimize SIMD implementations of ChaCha20 (such as CL 35842) interleave block computations in order to achieve high performance. This means that they produce more than 64 bytes of output at a time. Unfortunately when encrypting small amounts of data (such as Poly1305 keys) the current interface to ChaCha20 forces the additional encrypted blocks of output to be discarded and recomputed later since it does not maintain any state. This additional overhead slows down the encryption of small amounts of data when using such optimized code. This CL makes the generic ChaCha20 implementation stateful, caching key, nonce and counter values and buffering any unused key stream bytes. ChaCha20 now also implements the high level cipher.Stream interface which makes the API more consistent with other stream ciphers in the standard library's crypto package. This will make it easier to add high performance SIMD implementations in the future. In addition to modifying the API I have also added some optimizations to improve the performance of the generic implementation. Note that the performance will improve further on amd64 with Go 1.11 due to CL 95475 (binary.LittleEndian.PutUint32 optimization). These benchmarks are based on Go 1.10.1. name old speed new speed delta ChaCha20/32 174MB/s ± 2% 174MB/s ± 1% ~ (p=0.796 n=10+10) ChaCha20/63 309MB/s ± 1% 337MB/s ± 2% +9.32% (p=0.000 n=10+9) ChaCha20/64 299MB/s ± 2% 350MB/s ± 1% +17.12% (p=0.000 n=9+8) ChaCha20/256 297MB/s ± 2% 390MB/s ± 1% +31.40% (p=0.000 n=10+10) ChaCha20/1024 300MB/s ± 0% 400MB/s ± 3% +33.38% (p=0.000 n=7+10) ChaCha20/1350 290MB/s ± 1% 386MB/s ± 2% +33.10% (p=0.000 n=9+10) ChaCha20/65536 301MB/s ± 1% 416MB/s ± 2% +38.25% (p=0.000 n=9+10) ChaCha20-Poly1305 (AEAD optimizations manually disabled): name old speed new speed delta Chacha20Poly1305Open_64 122MB/s ± 7% 131MB/s ± 2% +7.23% (p=0.000 n=18+18) Chacha20Poly1305Seal_64 125MB/s ± 4% 137MB/s ± 2% +9.88% (p=0.000 n=20+19) Chacha20Poly1305Open_1350 244MB/s ± 4% 305MB/s ± 3% +25.04% (p=0.000 n=20+19) Chacha20Poly1305Seal_1350 242MB/s ± 3% 309MB/s ± 2% +27.56% (p=0.000 n=20+19) Chacha20Poly1305Open_8K 260MB/s ± 7% 338MB/s ± 3% +29.96% (p=0.000 n=20+19) Chacha20Poly1305Seal_8K 262MB/s ± 5% 335MB/s ± 4% +27.80% (p=0.000 n=20+19) No change in allocations for either set of benchmarks. Change-Id: I28ca7947904e9d79debe2d5aac6623526fe5e595 Reviewed-on: https://go-review.googlesource.com/104856 Run-TryBot: Michael Munday <mike.munday@ibm.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-18 16:32:01 +03:00
s.XORKeyStream(data, data)
got := hex.EncodeToString(data)
if got != c.output {
t.Errorf("length=%v: got %#v, want %#v", len(data), got, c.output)
internal/chacha20: implement the cipher.Stream interface and optimize SIMD implementations of ChaCha20 (such as CL 35842) interleave block computations in order to achieve high performance. This means that they produce more than 64 bytes of output at a time. Unfortunately when encrypting small amounts of data (such as Poly1305 keys) the current interface to ChaCha20 forces the additional encrypted blocks of output to be discarded and recomputed later since it does not maintain any state. This additional overhead slows down the encryption of small amounts of data when using such optimized code. This CL makes the generic ChaCha20 implementation stateful, caching key, nonce and counter values and buffering any unused key stream bytes. ChaCha20 now also implements the high level cipher.Stream interface which makes the API more consistent with other stream ciphers in the standard library's crypto package. This will make it easier to add high performance SIMD implementations in the future. In addition to modifying the API I have also added some optimizations to improve the performance of the generic implementation. Note that the performance will improve further on amd64 with Go 1.11 due to CL 95475 (binary.LittleEndian.PutUint32 optimization). These benchmarks are based on Go 1.10.1. name old speed new speed delta ChaCha20/32 174MB/s ± 2% 174MB/s ± 1% ~ (p=0.796 n=10+10) ChaCha20/63 309MB/s ± 1% 337MB/s ± 2% +9.32% (p=0.000 n=10+9) ChaCha20/64 299MB/s ± 2% 350MB/s ± 1% +17.12% (p=0.000 n=9+8) ChaCha20/256 297MB/s ± 2% 390MB/s ± 1% +31.40% (p=0.000 n=10+10) ChaCha20/1024 300MB/s ± 0% 400MB/s ± 3% +33.38% (p=0.000 n=7+10) ChaCha20/1350 290MB/s ± 1% 386MB/s ± 2% +33.10% (p=0.000 n=9+10) ChaCha20/65536 301MB/s ± 1% 416MB/s ± 2% +38.25% (p=0.000 n=9+10) ChaCha20-Poly1305 (AEAD optimizations manually disabled): name old speed new speed delta Chacha20Poly1305Open_64 122MB/s ± 7% 131MB/s ± 2% +7.23% (p=0.000 n=18+18) Chacha20Poly1305Seal_64 125MB/s ± 4% 137MB/s ± 2% +9.88% (p=0.000 n=20+19) Chacha20Poly1305Open_1350 244MB/s ± 4% 305MB/s ± 3% +25.04% (p=0.000 n=20+19) Chacha20Poly1305Seal_1350 242MB/s ± 3% 309MB/s ± 2% +27.56% (p=0.000 n=20+19) Chacha20Poly1305Open_8K 260MB/s ± 7% 338MB/s ± 3% +29.96% (p=0.000 n=20+19) Chacha20Poly1305Seal_8K 262MB/s ± 5% 335MB/s ± 4% +27.80% (p=0.000 n=20+19) No change in allocations for either set of benchmarks. Change-Id: I28ca7947904e9d79debe2d5aac6623526fe5e595 Reviewed-on: https://go-review.googlesource.com/104856 Run-TryBot: Michael Munday <mike.munday@ibm.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-18 16:32:01 +03:00
}
}
}
// Run the test cases with various source and destination offsets.
func TestUnaligned(t *testing.T) {
const max = 8 // max offset (+1) to test
for _, c := range testVectors {
data := hexDecode(c.input)
input := make([]byte, len(data)+max)
output := make([]byte, len(data)+max)
internal/chacha20: implement the cipher.Stream interface and optimize SIMD implementations of ChaCha20 (such as CL 35842) interleave block computations in order to achieve high performance. This means that they produce more than 64 bytes of output at a time. Unfortunately when encrypting small amounts of data (such as Poly1305 keys) the current interface to ChaCha20 forces the additional encrypted blocks of output to be discarded and recomputed later since it does not maintain any state. This additional overhead slows down the encryption of small amounts of data when using such optimized code. This CL makes the generic ChaCha20 implementation stateful, caching key, nonce and counter values and buffering any unused key stream bytes. ChaCha20 now also implements the high level cipher.Stream interface which makes the API more consistent with other stream ciphers in the standard library's crypto package. This will make it easier to add high performance SIMD implementations in the future. In addition to modifying the API I have also added some optimizations to improve the performance of the generic implementation. Note that the performance will improve further on amd64 with Go 1.11 due to CL 95475 (binary.LittleEndian.PutUint32 optimization). These benchmarks are based on Go 1.10.1. name old speed new speed delta ChaCha20/32 174MB/s ± 2% 174MB/s ± 1% ~ (p=0.796 n=10+10) ChaCha20/63 309MB/s ± 1% 337MB/s ± 2% +9.32% (p=0.000 n=10+9) ChaCha20/64 299MB/s ± 2% 350MB/s ± 1% +17.12% (p=0.000 n=9+8) ChaCha20/256 297MB/s ± 2% 390MB/s ± 1% +31.40% (p=0.000 n=10+10) ChaCha20/1024 300MB/s ± 0% 400MB/s ± 3% +33.38% (p=0.000 n=7+10) ChaCha20/1350 290MB/s ± 1% 386MB/s ± 2% +33.10% (p=0.000 n=9+10) ChaCha20/65536 301MB/s ± 1% 416MB/s ± 2% +38.25% (p=0.000 n=9+10) ChaCha20-Poly1305 (AEAD optimizations manually disabled): name old speed new speed delta Chacha20Poly1305Open_64 122MB/s ± 7% 131MB/s ± 2% +7.23% (p=0.000 n=18+18) Chacha20Poly1305Seal_64 125MB/s ± 4% 137MB/s ± 2% +9.88% (p=0.000 n=20+19) Chacha20Poly1305Open_1350 244MB/s ± 4% 305MB/s ± 3% +25.04% (p=0.000 n=20+19) Chacha20Poly1305Seal_1350 242MB/s ± 3% 309MB/s ± 2% +27.56% (p=0.000 n=20+19) Chacha20Poly1305Open_8K 260MB/s ± 7% 338MB/s ± 3% +29.96% (p=0.000 n=20+19) Chacha20Poly1305Seal_8K 262MB/s ± 5% 335MB/s ± 4% +27.80% (p=0.000 n=20+19) No change in allocations for either set of benchmarks. Change-Id: I28ca7947904e9d79debe2d5aac6623526fe5e595 Reviewed-on: https://go-review.googlesource.com/104856 Run-TryBot: Michael Munday <mike.munday@ibm.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-18 16:32:01 +03:00
for i := 0; i < max; i++ { // input offsets
for j := 0; j < max; j++ { // output offsets
chacha20: expose internal/chacha20 package const KeySize = 32 const NonceSize = 12 func HChaCha20(key, nonce []byte) ([]byte, error) type Cipher struct {} func NewUnauthenticatedCipher(key, nonce []byte) (*Cipher, error) func (s *Cipher) XORKeyStream(dst, src []byte) Small performance hit in chacha20poly1305, probably due to the loss of the Advance API, which we might consider adding later. No new allocations, thanks to the mid-stack inliner. name old time/op new time/op delta Chacha20Poly1305/Open-64-8 1.60µs ± 0% 1.68µs ± 1% +4.94% (p=0.000 n=9+10) Chacha20Poly1305/Seal-64-8 1.56µs ± 0% 1.64µs ± 1% +5.21% (p=0.000 n=8+10) Chacha20Poly1305/Open-64-X-8 2.10µs ± 1% 2.22µs ± 1% +5.81% (p=0.000 n=10+10) Chacha20Poly1305/Seal-64-X-8 2.07µs ± 1% 2.17µs ± 0% +4.88% (p=0.000 n=10+10) Chacha20Poly1305/Open-1350-8 15.4µs ± 0% 15.7µs ± 1% +1.65% (p=0.000 n=10+10) Chacha20Poly1305/Seal-1350-8 15.6µs ± 2% 15.9µs ± 1% +1.58% (p=0.028 n=10+9) Chacha20Poly1305/Open-1350-X-8 16.0µs ± 1% 16.3µs ± 2% +2.00% (p=0.000 n=10+10) Chacha20Poly1305/Seal-1350-X-8 15.9µs ± 0% 16.3µs ± 1% +1.91% (p=0.000 n=10+8) Chacha20Poly1305/Open-8192-8 85.6µs ± 0% 86.6µs ± 1% +1.21% (p=0.000 n=10+10) Chacha20Poly1305/Seal-8192-8 85.7µs ± 0% 86.3µs ± 0% +0.68% (p=0.001 n=9+9) Chacha20Poly1305/Open-8192-X-8 86.4µs ± 1% 87.1µs ± 1% +0.76% (p=0.035 n=10+9) Chacha20Poly1305/Seal-8192-X-8 86.0µs ± 0% 87.0µs ± 1% +1.14% (p=0.000 n=9+9) Updates golang/go#24485 Change-Id: I2ec2ef487a03f013049915d9063751c75a78408b Reviewed-on: https://go-review.googlesource.com/c/crypto/+/185980 Reviewed-by: Michael Munday <mike.munday@ibm.com>
2019-07-12 19:29:02 +03:00
s, _ := NewUnauthenticatedCipher(hexDecode(c.key), hexDecode(c.nonce))
internal/chacha20: implement the cipher.Stream interface and optimize SIMD implementations of ChaCha20 (such as CL 35842) interleave block computations in order to achieve high performance. This means that they produce more than 64 bytes of output at a time. Unfortunately when encrypting small amounts of data (such as Poly1305 keys) the current interface to ChaCha20 forces the additional encrypted blocks of output to be discarded and recomputed later since it does not maintain any state. This additional overhead slows down the encryption of small amounts of data when using such optimized code. This CL makes the generic ChaCha20 implementation stateful, caching key, nonce and counter values and buffering any unused key stream bytes. ChaCha20 now also implements the high level cipher.Stream interface which makes the API more consistent with other stream ciphers in the standard library's crypto package. This will make it easier to add high performance SIMD implementations in the future. In addition to modifying the API I have also added some optimizations to improve the performance of the generic implementation. Note that the performance will improve further on amd64 with Go 1.11 due to CL 95475 (binary.LittleEndian.PutUint32 optimization). These benchmarks are based on Go 1.10.1. name old speed new speed delta ChaCha20/32 174MB/s ± 2% 174MB/s ± 1% ~ (p=0.796 n=10+10) ChaCha20/63 309MB/s ± 1% 337MB/s ± 2% +9.32% (p=0.000 n=10+9) ChaCha20/64 299MB/s ± 2% 350MB/s ± 1% +17.12% (p=0.000 n=9+8) ChaCha20/256 297MB/s ± 2% 390MB/s ± 1% +31.40% (p=0.000 n=10+10) ChaCha20/1024 300MB/s ± 0% 400MB/s ± 3% +33.38% (p=0.000 n=7+10) ChaCha20/1350 290MB/s ± 1% 386MB/s ± 2% +33.10% (p=0.000 n=9+10) ChaCha20/65536 301MB/s ± 1% 416MB/s ± 2% +38.25% (p=0.000 n=9+10) ChaCha20-Poly1305 (AEAD optimizations manually disabled): name old speed new speed delta Chacha20Poly1305Open_64 122MB/s ± 7% 131MB/s ± 2% +7.23% (p=0.000 n=18+18) Chacha20Poly1305Seal_64 125MB/s ± 4% 137MB/s ± 2% +9.88% (p=0.000 n=20+19) Chacha20Poly1305Open_1350 244MB/s ± 4% 305MB/s ± 3% +25.04% (p=0.000 n=20+19) Chacha20Poly1305Seal_1350 242MB/s ± 3% 309MB/s ± 2% +27.56% (p=0.000 n=20+19) Chacha20Poly1305Open_8K 260MB/s ± 7% 338MB/s ± 3% +29.96% (p=0.000 n=20+19) Chacha20Poly1305Seal_8K 262MB/s ± 5% 335MB/s ± 4% +27.80% (p=0.000 n=20+19) No change in allocations for either set of benchmarks. Change-Id: I28ca7947904e9d79debe2d5aac6623526fe5e595 Reviewed-on: https://go-review.googlesource.com/104856 Run-TryBot: Michael Munday <mike.munday@ibm.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-18 16:32:01 +03:00
input := input[i : i+len(data)]
output := output[j : j+len(data)]
internal/chacha20: implement the cipher.Stream interface and optimize SIMD implementations of ChaCha20 (such as CL 35842) interleave block computations in order to achieve high performance. This means that they produce more than 64 bytes of output at a time. Unfortunately when encrypting small amounts of data (such as Poly1305 keys) the current interface to ChaCha20 forces the additional encrypted blocks of output to be discarded and recomputed later since it does not maintain any state. This additional overhead slows down the encryption of small amounts of data when using such optimized code. This CL makes the generic ChaCha20 implementation stateful, caching key, nonce and counter values and buffering any unused key stream bytes. ChaCha20 now also implements the high level cipher.Stream interface which makes the API more consistent with other stream ciphers in the standard library's crypto package. This will make it easier to add high performance SIMD implementations in the future. In addition to modifying the API I have also added some optimizations to improve the performance of the generic implementation. Note that the performance will improve further on amd64 with Go 1.11 due to CL 95475 (binary.LittleEndian.PutUint32 optimization). These benchmarks are based on Go 1.10.1. name old speed new speed delta ChaCha20/32 174MB/s ± 2% 174MB/s ± 1% ~ (p=0.796 n=10+10) ChaCha20/63 309MB/s ± 1% 337MB/s ± 2% +9.32% (p=0.000 n=10+9) ChaCha20/64 299MB/s ± 2% 350MB/s ± 1% +17.12% (p=0.000 n=9+8) ChaCha20/256 297MB/s ± 2% 390MB/s ± 1% +31.40% (p=0.000 n=10+10) ChaCha20/1024 300MB/s ± 0% 400MB/s ± 3% +33.38% (p=0.000 n=7+10) ChaCha20/1350 290MB/s ± 1% 386MB/s ± 2% +33.10% (p=0.000 n=9+10) ChaCha20/65536 301MB/s ± 1% 416MB/s ± 2% +38.25% (p=0.000 n=9+10) ChaCha20-Poly1305 (AEAD optimizations manually disabled): name old speed new speed delta Chacha20Poly1305Open_64 122MB/s ± 7% 131MB/s ± 2% +7.23% (p=0.000 n=18+18) Chacha20Poly1305Seal_64 125MB/s ± 4% 137MB/s ± 2% +9.88% (p=0.000 n=20+19) Chacha20Poly1305Open_1350 244MB/s ± 4% 305MB/s ± 3% +25.04% (p=0.000 n=20+19) Chacha20Poly1305Seal_1350 242MB/s ± 3% 309MB/s ± 2% +27.56% (p=0.000 n=20+19) Chacha20Poly1305Open_8K 260MB/s ± 7% 338MB/s ± 3% +29.96% (p=0.000 n=20+19) Chacha20Poly1305Seal_8K 262MB/s ± 5% 335MB/s ± 4% +27.80% (p=0.000 n=20+19) No change in allocations for either set of benchmarks. Change-Id: I28ca7947904e9d79debe2d5aac6623526fe5e595 Reviewed-on: https://go-review.googlesource.com/104856 Run-TryBot: Michael Munday <mike.munday@ibm.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-18 16:32:01 +03:00
copy(input, data)
s.XORKeyStream(output, input)
got := hex.EncodeToString(output)
if got != c.output {
t.Errorf("length=%v: got %#v, want %#v", len(data), got, c.output)
internal/chacha20: implement the cipher.Stream interface and optimize SIMD implementations of ChaCha20 (such as CL 35842) interleave block computations in order to achieve high performance. This means that they produce more than 64 bytes of output at a time. Unfortunately when encrypting small amounts of data (such as Poly1305 keys) the current interface to ChaCha20 forces the additional encrypted blocks of output to be discarded and recomputed later since it does not maintain any state. This additional overhead slows down the encryption of small amounts of data when using such optimized code. This CL makes the generic ChaCha20 implementation stateful, caching key, nonce and counter values and buffering any unused key stream bytes. ChaCha20 now also implements the high level cipher.Stream interface which makes the API more consistent with other stream ciphers in the standard library's crypto package. This will make it easier to add high performance SIMD implementations in the future. In addition to modifying the API I have also added some optimizations to improve the performance of the generic implementation. Note that the performance will improve further on amd64 with Go 1.11 due to CL 95475 (binary.LittleEndian.PutUint32 optimization). These benchmarks are based on Go 1.10.1. name old speed new speed delta ChaCha20/32 174MB/s ± 2% 174MB/s ± 1% ~ (p=0.796 n=10+10) ChaCha20/63 309MB/s ± 1% 337MB/s ± 2% +9.32% (p=0.000 n=10+9) ChaCha20/64 299MB/s ± 2% 350MB/s ± 1% +17.12% (p=0.000 n=9+8) ChaCha20/256 297MB/s ± 2% 390MB/s ± 1% +31.40% (p=0.000 n=10+10) ChaCha20/1024 300MB/s ± 0% 400MB/s ± 3% +33.38% (p=0.000 n=7+10) ChaCha20/1350 290MB/s ± 1% 386MB/s ± 2% +33.10% (p=0.000 n=9+10) ChaCha20/65536 301MB/s ± 1% 416MB/s ± 2% +38.25% (p=0.000 n=9+10) ChaCha20-Poly1305 (AEAD optimizations manually disabled): name old speed new speed delta Chacha20Poly1305Open_64 122MB/s ± 7% 131MB/s ± 2% +7.23% (p=0.000 n=18+18) Chacha20Poly1305Seal_64 125MB/s ± 4% 137MB/s ± 2% +9.88% (p=0.000 n=20+19) Chacha20Poly1305Open_1350 244MB/s ± 4% 305MB/s ± 3% +25.04% (p=0.000 n=20+19) Chacha20Poly1305Seal_1350 242MB/s ± 3% 309MB/s ± 2% +27.56% (p=0.000 n=20+19) Chacha20Poly1305Open_8K 260MB/s ± 7% 338MB/s ± 3% +29.96% (p=0.000 n=20+19) Chacha20Poly1305Seal_8K 262MB/s ± 5% 335MB/s ± 4% +27.80% (p=0.000 n=20+19) No change in allocations for either set of benchmarks. Change-Id: I28ca7947904e9d79debe2d5aac6623526fe5e595 Reviewed-on: https://go-review.googlesource.com/104856 Run-TryBot: Michael Munday <mike.munday@ibm.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-18 16:32:01 +03:00
}
}
}
}
}
// Run the test cases by calling XORKeyStream multiple times.
func TestStep(t *testing.T) {
// wide range of step sizes to try and hit edge cases
steps := [...]int{1, 3, 4, 7, 8, 17, 24, 30, 64, 256}
rnd := rand.New(rand.NewSource(123))
for _, c := range testVectors {
chacha20: expose internal/chacha20 package const KeySize = 32 const NonceSize = 12 func HChaCha20(key, nonce []byte) ([]byte, error) type Cipher struct {} func NewUnauthenticatedCipher(key, nonce []byte) (*Cipher, error) func (s *Cipher) XORKeyStream(dst, src []byte) Small performance hit in chacha20poly1305, probably due to the loss of the Advance API, which we might consider adding later. No new allocations, thanks to the mid-stack inliner. name old time/op new time/op delta Chacha20Poly1305/Open-64-8 1.60µs ± 0% 1.68µs ± 1% +4.94% (p=0.000 n=9+10) Chacha20Poly1305/Seal-64-8 1.56µs ± 0% 1.64µs ± 1% +5.21% (p=0.000 n=8+10) Chacha20Poly1305/Open-64-X-8 2.10µs ± 1% 2.22µs ± 1% +5.81% (p=0.000 n=10+10) Chacha20Poly1305/Seal-64-X-8 2.07µs ± 1% 2.17µs ± 0% +4.88% (p=0.000 n=10+10) Chacha20Poly1305/Open-1350-8 15.4µs ± 0% 15.7µs ± 1% +1.65% (p=0.000 n=10+10) Chacha20Poly1305/Seal-1350-8 15.6µs ± 2% 15.9µs ± 1% +1.58% (p=0.028 n=10+9) Chacha20Poly1305/Open-1350-X-8 16.0µs ± 1% 16.3µs ± 2% +2.00% (p=0.000 n=10+10) Chacha20Poly1305/Seal-1350-X-8 15.9µs ± 0% 16.3µs ± 1% +1.91% (p=0.000 n=10+8) Chacha20Poly1305/Open-8192-8 85.6µs ± 0% 86.6µs ± 1% +1.21% (p=0.000 n=10+10) Chacha20Poly1305/Seal-8192-8 85.7µs ± 0% 86.3µs ± 0% +0.68% (p=0.001 n=9+9) Chacha20Poly1305/Open-8192-X-8 86.4µs ± 1% 87.1µs ± 1% +0.76% (p=0.035 n=10+9) Chacha20Poly1305/Seal-8192-X-8 86.0µs ± 0% 87.0µs ± 1% +1.14% (p=0.000 n=9+9) Updates golang/go#24485 Change-Id: I2ec2ef487a03f013049915d9063751c75a78408b Reviewed-on: https://go-review.googlesource.com/c/crypto/+/185980 Reviewed-by: Michael Munday <mike.munday@ibm.com>
2019-07-12 19:29:02 +03:00
s, _ := NewUnauthenticatedCipher(hexDecode(c.key), hexDecode(c.nonce))
input := hexDecode(c.input)
output := make([]byte, len(input))
internal/chacha20: implement the cipher.Stream interface and optimize SIMD implementations of ChaCha20 (such as CL 35842) interleave block computations in order to achieve high performance. This means that they produce more than 64 bytes of output at a time. Unfortunately when encrypting small amounts of data (such as Poly1305 keys) the current interface to ChaCha20 forces the additional encrypted blocks of output to be discarded and recomputed later since it does not maintain any state. This additional overhead slows down the encryption of small amounts of data when using such optimized code. This CL makes the generic ChaCha20 implementation stateful, caching key, nonce and counter values and buffering any unused key stream bytes. ChaCha20 now also implements the high level cipher.Stream interface which makes the API more consistent with other stream ciphers in the standard library's crypto package. This will make it easier to add high performance SIMD implementations in the future. In addition to modifying the API I have also added some optimizations to improve the performance of the generic implementation. Note that the performance will improve further on amd64 with Go 1.11 due to CL 95475 (binary.LittleEndian.PutUint32 optimization). These benchmarks are based on Go 1.10.1. name old speed new speed delta ChaCha20/32 174MB/s ± 2% 174MB/s ± 1% ~ (p=0.796 n=10+10) ChaCha20/63 309MB/s ± 1% 337MB/s ± 2% +9.32% (p=0.000 n=10+9) ChaCha20/64 299MB/s ± 2% 350MB/s ± 1% +17.12% (p=0.000 n=9+8) ChaCha20/256 297MB/s ± 2% 390MB/s ± 1% +31.40% (p=0.000 n=10+10) ChaCha20/1024 300MB/s ± 0% 400MB/s ± 3% +33.38% (p=0.000 n=7+10) ChaCha20/1350 290MB/s ± 1% 386MB/s ± 2% +33.10% (p=0.000 n=9+10) ChaCha20/65536 301MB/s ± 1% 416MB/s ± 2% +38.25% (p=0.000 n=9+10) ChaCha20-Poly1305 (AEAD optimizations manually disabled): name old speed new speed delta Chacha20Poly1305Open_64 122MB/s ± 7% 131MB/s ± 2% +7.23% (p=0.000 n=18+18) Chacha20Poly1305Seal_64 125MB/s ± 4% 137MB/s ± 2% +9.88% (p=0.000 n=20+19) Chacha20Poly1305Open_1350 244MB/s ± 4% 305MB/s ± 3% +25.04% (p=0.000 n=20+19) Chacha20Poly1305Seal_1350 242MB/s ± 3% 309MB/s ± 2% +27.56% (p=0.000 n=20+19) Chacha20Poly1305Open_8K 260MB/s ± 7% 338MB/s ± 3% +29.96% (p=0.000 n=20+19) Chacha20Poly1305Seal_8K 262MB/s ± 5% 335MB/s ± 4% +27.80% (p=0.000 n=20+19) No change in allocations for either set of benchmarks. Change-Id: I28ca7947904e9d79debe2d5aac6623526fe5e595 Reviewed-on: https://go-review.googlesource.com/104856 Run-TryBot: Michael Munday <mike.munday@ibm.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-18 16:32:01 +03:00
// step through the buffers
i, step := 0, steps[rnd.Intn(len(steps))]
for i+step < len(input) {
internal/chacha20: implement the cipher.Stream interface and optimize SIMD implementations of ChaCha20 (such as CL 35842) interleave block computations in order to achieve high performance. This means that they produce more than 64 bytes of output at a time. Unfortunately when encrypting small amounts of data (such as Poly1305 keys) the current interface to ChaCha20 forces the additional encrypted blocks of output to be discarded and recomputed later since it does not maintain any state. This additional overhead slows down the encryption of small amounts of data when using such optimized code. This CL makes the generic ChaCha20 implementation stateful, caching key, nonce and counter values and buffering any unused key stream bytes. ChaCha20 now also implements the high level cipher.Stream interface which makes the API more consistent with other stream ciphers in the standard library's crypto package. This will make it easier to add high performance SIMD implementations in the future. In addition to modifying the API I have also added some optimizations to improve the performance of the generic implementation. Note that the performance will improve further on amd64 with Go 1.11 due to CL 95475 (binary.LittleEndian.PutUint32 optimization). These benchmarks are based on Go 1.10.1. name old speed new speed delta ChaCha20/32 174MB/s ± 2% 174MB/s ± 1% ~ (p=0.796 n=10+10) ChaCha20/63 309MB/s ± 1% 337MB/s ± 2% +9.32% (p=0.000 n=10+9) ChaCha20/64 299MB/s ± 2% 350MB/s ± 1% +17.12% (p=0.000 n=9+8) ChaCha20/256 297MB/s ± 2% 390MB/s ± 1% +31.40% (p=0.000 n=10+10) ChaCha20/1024 300MB/s ± 0% 400MB/s ± 3% +33.38% (p=0.000 n=7+10) ChaCha20/1350 290MB/s ± 1% 386MB/s ± 2% +33.10% (p=0.000 n=9+10) ChaCha20/65536 301MB/s ± 1% 416MB/s ± 2% +38.25% (p=0.000 n=9+10) ChaCha20-Poly1305 (AEAD optimizations manually disabled): name old speed new speed delta Chacha20Poly1305Open_64 122MB/s ± 7% 131MB/s ± 2% +7.23% (p=0.000 n=18+18) Chacha20Poly1305Seal_64 125MB/s ± 4% 137MB/s ± 2% +9.88% (p=0.000 n=20+19) Chacha20Poly1305Open_1350 244MB/s ± 4% 305MB/s ± 3% +25.04% (p=0.000 n=20+19) Chacha20Poly1305Seal_1350 242MB/s ± 3% 309MB/s ± 2% +27.56% (p=0.000 n=20+19) Chacha20Poly1305Open_8K 260MB/s ± 7% 338MB/s ± 3% +29.96% (p=0.000 n=20+19) Chacha20Poly1305Seal_8K 262MB/s ± 5% 335MB/s ± 4% +27.80% (p=0.000 n=20+19) No change in allocations for either set of benchmarks. Change-Id: I28ca7947904e9d79debe2d5aac6623526fe5e595 Reviewed-on: https://go-review.googlesource.com/104856 Run-TryBot: Michael Munday <mike.munday@ibm.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-18 16:32:01 +03:00
s.XORKeyStream(output[i:i+step], input[i:i+step])
if i+step < len(input) && output[i+step] != 0 {
t.Errorf("length=%v, i=%v, step=%v: output overwritten", len(input), i, step)
internal/chacha20: implement the cipher.Stream interface and optimize SIMD implementations of ChaCha20 (such as CL 35842) interleave block computations in order to achieve high performance. This means that they produce more than 64 bytes of output at a time. Unfortunately when encrypting small amounts of data (such as Poly1305 keys) the current interface to ChaCha20 forces the additional encrypted blocks of output to be discarded and recomputed later since it does not maintain any state. This additional overhead slows down the encryption of small amounts of data when using such optimized code. This CL makes the generic ChaCha20 implementation stateful, caching key, nonce and counter values and buffering any unused key stream bytes. ChaCha20 now also implements the high level cipher.Stream interface which makes the API more consistent with other stream ciphers in the standard library's crypto package. This will make it easier to add high performance SIMD implementations in the future. In addition to modifying the API I have also added some optimizations to improve the performance of the generic implementation. Note that the performance will improve further on amd64 with Go 1.11 due to CL 95475 (binary.LittleEndian.PutUint32 optimization). These benchmarks are based on Go 1.10.1. name old speed new speed delta ChaCha20/32 174MB/s ± 2% 174MB/s ± 1% ~ (p=0.796 n=10+10) ChaCha20/63 309MB/s ± 1% 337MB/s ± 2% +9.32% (p=0.000 n=10+9) ChaCha20/64 299MB/s ± 2% 350MB/s ± 1% +17.12% (p=0.000 n=9+8) ChaCha20/256 297MB/s ± 2% 390MB/s ± 1% +31.40% (p=0.000 n=10+10) ChaCha20/1024 300MB/s ± 0% 400MB/s ± 3% +33.38% (p=0.000 n=7+10) ChaCha20/1350 290MB/s ± 1% 386MB/s ± 2% +33.10% (p=0.000 n=9+10) ChaCha20/65536 301MB/s ± 1% 416MB/s ± 2% +38.25% (p=0.000 n=9+10) ChaCha20-Poly1305 (AEAD optimizations manually disabled): name old speed new speed delta Chacha20Poly1305Open_64 122MB/s ± 7% 131MB/s ± 2% +7.23% (p=0.000 n=18+18) Chacha20Poly1305Seal_64 125MB/s ± 4% 137MB/s ± 2% +9.88% (p=0.000 n=20+19) Chacha20Poly1305Open_1350 244MB/s ± 4% 305MB/s ± 3% +25.04% (p=0.000 n=20+19) Chacha20Poly1305Seal_1350 242MB/s ± 3% 309MB/s ± 2% +27.56% (p=0.000 n=20+19) Chacha20Poly1305Open_8K 260MB/s ± 7% 338MB/s ± 3% +29.96% (p=0.000 n=20+19) Chacha20Poly1305Seal_8K 262MB/s ± 5% 335MB/s ± 4% +27.80% (p=0.000 n=20+19) No change in allocations for either set of benchmarks. Change-Id: I28ca7947904e9d79debe2d5aac6623526fe5e595 Reviewed-on: https://go-review.googlesource.com/104856 Run-TryBot: Michael Munday <mike.munday@ibm.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-18 16:32:01 +03:00
}
i += step
step = steps[rnd.Intn(len(steps))]
}
// finish the encryption
s.XORKeyStream(output[i:], input[i:])
// ensure we tolerate a call with an empty input
s.XORKeyStream(output[len(output):], input[len(input):])
internal/chacha20: implement the cipher.Stream interface and optimize SIMD implementations of ChaCha20 (such as CL 35842) interleave block computations in order to achieve high performance. This means that they produce more than 64 bytes of output at a time. Unfortunately when encrypting small amounts of data (such as Poly1305 keys) the current interface to ChaCha20 forces the additional encrypted blocks of output to be discarded and recomputed later since it does not maintain any state. This additional overhead slows down the encryption of small amounts of data when using such optimized code. This CL makes the generic ChaCha20 implementation stateful, caching key, nonce and counter values and buffering any unused key stream bytes. ChaCha20 now also implements the high level cipher.Stream interface which makes the API more consistent with other stream ciphers in the standard library's crypto package. This will make it easier to add high performance SIMD implementations in the future. In addition to modifying the API I have also added some optimizations to improve the performance of the generic implementation. Note that the performance will improve further on amd64 with Go 1.11 due to CL 95475 (binary.LittleEndian.PutUint32 optimization). These benchmarks are based on Go 1.10.1. name old speed new speed delta ChaCha20/32 174MB/s ± 2% 174MB/s ± 1% ~ (p=0.796 n=10+10) ChaCha20/63 309MB/s ± 1% 337MB/s ± 2% +9.32% (p=0.000 n=10+9) ChaCha20/64 299MB/s ± 2% 350MB/s ± 1% +17.12% (p=0.000 n=9+8) ChaCha20/256 297MB/s ± 2% 390MB/s ± 1% +31.40% (p=0.000 n=10+10) ChaCha20/1024 300MB/s ± 0% 400MB/s ± 3% +33.38% (p=0.000 n=7+10) ChaCha20/1350 290MB/s ± 1% 386MB/s ± 2% +33.10% (p=0.000 n=9+10) ChaCha20/65536 301MB/s ± 1% 416MB/s ± 2% +38.25% (p=0.000 n=9+10) ChaCha20-Poly1305 (AEAD optimizations manually disabled): name old speed new speed delta Chacha20Poly1305Open_64 122MB/s ± 7% 131MB/s ± 2% +7.23% (p=0.000 n=18+18) Chacha20Poly1305Seal_64 125MB/s ± 4% 137MB/s ± 2% +9.88% (p=0.000 n=20+19) Chacha20Poly1305Open_1350 244MB/s ± 4% 305MB/s ± 3% +25.04% (p=0.000 n=20+19) Chacha20Poly1305Seal_1350 242MB/s ± 3% 309MB/s ± 2% +27.56% (p=0.000 n=20+19) Chacha20Poly1305Open_8K 260MB/s ± 7% 338MB/s ± 3% +29.96% (p=0.000 n=20+19) Chacha20Poly1305Seal_8K 262MB/s ± 5% 335MB/s ± 4% +27.80% (p=0.000 n=20+19) No change in allocations for either set of benchmarks. Change-Id: I28ca7947904e9d79debe2d5aac6623526fe5e595 Reviewed-on: https://go-review.googlesource.com/104856 Run-TryBot: Michael Munday <mike.munday@ibm.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-18 16:32:01 +03:00
got := hex.EncodeToString(output)
if got != c.output {
t.Errorf("length=%v: got %#v, want %#v", len(input), got, c.output)
internal/chacha20: implement the cipher.Stream interface and optimize SIMD implementations of ChaCha20 (such as CL 35842) interleave block computations in order to achieve high performance. This means that they produce more than 64 bytes of output at a time. Unfortunately when encrypting small amounts of data (such as Poly1305 keys) the current interface to ChaCha20 forces the additional encrypted blocks of output to be discarded and recomputed later since it does not maintain any state. This additional overhead slows down the encryption of small amounts of data when using such optimized code. This CL makes the generic ChaCha20 implementation stateful, caching key, nonce and counter values and buffering any unused key stream bytes. ChaCha20 now also implements the high level cipher.Stream interface which makes the API more consistent with other stream ciphers in the standard library's crypto package. This will make it easier to add high performance SIMD implementations in the future. In addition to modifying the API I have also added some optimizations to improve the performance of the generic implementation. Note that the performance will improve further on amd64 with Go 1.11 due to CL 95475 (binary.LittleEndian.PutUint32 optimization). These benchmarks are based on Go 1.10.1. name old speed new speed delta ChaCha20/32 174MB/s ± 2% 174MB/s ± 1% ~ (p=0.796 n=10+10) ChaCha20/63 309MB/s ± 1% 337MB/s ± 2% +9.32% (p=0.000 n=10+9) ChaCha20/64 299MB/s ± 2% 350MB/s ± 1% +17.12% (p=0.000 n=9+8) ChaCha20/256 297MB/s ± 2% 390MB/s ± 1% +31.40% (p=0.000 n=10+10) ChaCha20/1024 300MB/s ± 0% 400MB/s ± 3% +33.38% (p=0.000 n=7+10) ChaCha20/1350 290MB/s ± 1% 386MB/s ± 2% +33.10% (p=0.000 n=9+10) ChaCha20/65536 301MB/s ± 1% 416MB/s ± 2% +38.25% (p=0.000 n=9+10) ChaCha20-Poly1305 (AEAD optimizations manually disabled): name old speed new speed delta Chacha20Poly1305Open_64 122MB/s ± 7% 131MB/s ± 2% +7.23% (p=0.000 n=18+18) Chacha20Poly1305Seal_64 125MB/s ± 4% 137MB/s ± 2% +9.88% (p=0.000 n=20+19) Chacha20Poly1305Open_1350 244MB/s ± 4% 305MB/s ± 3% +25.04% (p=0.000 n=20+19) Chacha20Poly1305Seal_1350 242MB/s ± 3% 309MB/s ± 2% +27.56% (p=0.000 n=20+19) Chacha20Poly1305Open_8K 260MB/s ± 7% 338MB/s ± 3% +29.96% (p=0.000 n=20+19) Chacha20Poly1305Seal_8K 262MB/s ± 5% 335MB/s ± 4% +27.80% (p=0.000 n=20+19) No change in allocations for either set of benchmarks. Change-Id: I28ca7947904e9d79debe2d5aac6623526fe5e595 Reviewed-on: https://go-review.googlesource.com/104856 Run-TryBot: Michael Munday <mike.munday@ibm.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-18 16:32:01 +03:00
}
}
}
func TestSetCounter(t *testing.T) {
newCipher := func() *Cipher {
s, _ := NewUnauthenticatedCipher(make([]byte, KeySize), make([]byte, NonceSize))
return s
}
s := newCipher()
src := bytes.Repeat([]byte("test"), 32) // two 64-byte blocks
dst1 := make([]byte, len(src))
s.XORKeyStream(dst1, src)
// advance counter to 1 and xor second block
s = newCipher()
s.SetCounter(1)
dst2 := make([]byte, len(src))
s.XORKeyStream(dst2[64:], src[64:])
if !bytes.Equal(dst1[64:], dst2[64:]) {
t.Error("failed to produce identical output using SetCounter")
}
// test again with unaligned blocks; SetCounter should reset the buffer
s = newCipher()
s.XORKeyStream(dst1[:70], src[:70])
s = newCipher()
s.XORKeyStream([]byte{0}, []byte{0})
s.SetCounter(1)
s.XORKeyStream(dst2[64:70], src[64:70])
if !bytes.Equal(dst1[64:70], dst2[64:70]) {
t.Error("SetCounter did not reset buffer")
}
// advancing to a lower counter value should cause a panic
panics := func(fn func()) (p bool) {
defer func() { p = recover() != nil }()
fn()
return
}
if !panics(func() { s.SetCounter(0) }) {
t.Error("counter decreasing should trigger a panic")
}
}
func TestLastBlock(t *testing.T) {
panics := func(fn func()) (p bool) {
defer func() { p = recover() != nil }()
fn()
return
}
checkLastBlock := func(b []byte) {
t.Helper()
// Hardcoded result to check all implementations generate the same output.
lastBlock := "ace4cd09e294d1912d4ad205d06f95d9c2f2bfcf453e8753f128765b62215f4d" +
"92c74f2f626c6a640c0b1284d839ec81f1696281dafc3e684593937023b58b1d"
if got := hex.EncodeToString(b); got != lastBlock {
t.Errorf("wrong output for the last block, got %q, want %q", got, lastBlock)
}
}
// setting the counter to 0xffffffff and crypting multiple blocks should
// trigger a panic
s, _ := NewUnauthenticatedCipher(make([]byte, KeySize), make([]byte, NonceSize))
s.SetCounter(0xffffffff)
blocks := make([]byte, blockSize*2)
if !panics(func() { s.XORKeyStream(blocks, blocks) }) {
t.Error("crypting multiple blocks should trigger a panic")
}
// setting the counter to 0xffffffff - 1 and crypting two blocks should not
// trigger a panic
s, _ = NewUnauthenticatedCipher(make([]byte, KeySize), make([]byte, NonceSize))
s.SetCounter(0xffffffff - 1)
if panics(func() { s.XORKeyStream(blocks, blocks) }) {
t.Error("crypting the last blocks should not trigger a panic")
}
checkLastBlock(blocks[blockSize:])
// once all the keystream is spent, setting the counter should panic
if !panics(func() { s.SetCounter(0xffffffff) }) {
t.Error("setting the counter after overflow should trigger a panic")
}
// crypting a subsequent block *should* panic
block := make([]byte, blockSize)
if !panics(func() { s.XORKeyStream(block, block) }) {
t.Error("crypting after overflow should trigger a panic")
}
// if we crypt less than a full block, we should be able to crypt the rest
// in a subsequent call without panicking
s, _ = NewUnauthenticatedCipher(make([]byte, KeySize), make([]byte, NonceSize))
s.SetCounter(0xffffffff)
if panics(func() { s.XORKeyStream(block[:7], block[:7]) }) {
t.Error("crypting part of the last block should not trigger a panic")
}
if panics(func() { s.XORKeyStream(block[7:], block[7:]) }) {
t.Error("crypting part of the last block should not trigger a panic")
}
checkLastBlock(block)
// as before, a third call should trigger a panic because all keystream is spent
if !panics(func() { s.XORKeyStream(block[:1], block[:1]) }) {
t.Error("crypting after overflow should trigger a panic")
}
}
func benchmarkChaCha20(b *testing.B, step, count int) {
tot := step * count
src := make([]byte, tot)
dst := make([]byte, tot)
chacha20: expose internal/chacha20 package const KeySize = 32 const NonceSize = 12 func HChaCha20(key, nonce []byte) ([]byte, error) type Cipher struct {} func NewUnauthenticatedCipher(key, nonce []byte) (*Cipher, error) func (s *Cipher) XORKeyStream(dst, src []byte) Small performance hit in chacha20poly1305, probably due to the loss of the Advance API, which we might consider adding later. No new allocations, thanks to the mid-stack inliner. name old time/op new time/op delta Chacha20Poly1305/Open-64-8 1.60µs ± 0% 1.68µs ± 1% +4.94% (p=0.000 n=9+10) Chacha20Poly1305/Seal-64-8 1.56µs ± 0% 1.64µs ± 1% +5.21% (p=0.000 n=8+10) Chacha20Poly1305/Open-64-X-8 2.10µs ± 1% 2.22µs ± 1% +5.81% (p=0.000 n=10+10) Chacha20Poly1305/Seal-64-X-8 2.07µs ± 1% 2.17µs ± 0% +4.88% (p=0.000 n=10+10) Chacha20Poly1305/Open-1350-8 15.4µs ± 0% 15.7µs ± 1% +1.65% (p=0.000 n=10+10) Chacha20Poly1305/Seal-1350-8 15.6µs ± 2% 15.9µs ± 1% +1.58% (p=0.028 n=10+9) Chacha20Poly1305/Open-1350-X-8 16.0µs ± 1% 16.3µs ± 2% +2.00% (p=0.000 n=10+10) Chacha20Poly1305/Seal-1350-X-8 15.9µs ± 0% 16.3µs ± 1% +1.91% (p=0.000 n=10+8) Chacha20Poly1305/Open-8192-8 85.6µs ± 0% 86.6µs ± 1% +1.21% (p=0.000 n=10+10) Chacha20Poly1305/Seal-8192-8 85.7µs ± 0% 86.3µs ± 0% +0.68% (p=0.001 n=9+9) Chacha20Poly1305/Open-8192-X-8 86.4µs ± 1% 87.1µs ± 1% +0.76% (p=0.035 n=10+9) Chacha20Poly1305/Seal-8192-X-8 86.0µs ± 0% 87.0µs ± 1% +1.14% (p=0.000 n=9+9) Updates golang/go#24485 Change-Id: I2ec2ef487a03f013049915d9063751c75a78408b Reviewed-on: https://go-review.googlesource.com/c/crypto/+/185980 Reviewed-by: Michael Munday <mike.munday@ibm.com>
2019-07-12 19:29:02 +03:00
key := make([]byte, KeySize)
nonce := make([]byte, NonceSize)
b.SetBytes(int64(tot))
b.ResetTimer()
for i := 0; i < b.N; i++ {
chacha20: expose internal/chacha20 package const KeySize = 32 const NonceSize = 12 func HChaCha20(key, nonce []byte) ([]byte, error) type Cipher struct {} func NewUnauthenticatedCipher(key, nonce []byte) (*Cipher, error) func (s *Cipher) XORKeyStream(dst, src []byte) Small performance hit in chacha20poly1305, probably due to the loss of the Advance API, which we might consider adding later. No new allocations, thanks to the mid-stack inliner. name old time/op new time/op delta Chacha20Poly1305/Open-64-8 1.60µs ± 0% 1.68µs ± 1% +4.94% (p=0.000 n=9+10) Chacha20Poly1305/Seal-64-8 1.56µs ± 0% 1.64µs ± 1% +5.21% (p=0.000 n=8+10) Chacha20Poly1305/Open-64-X-8 2.10µs ± 1% 2.22µs ± 1% +5.81% (p=0.000 n=10+10) Chacha20Poly1305/Seal-64-X-8 2.07µs ± 1% 2.17µs ± 0% +4.88% (p=0.000 n=10+10) Chacha20Poly1305/Open-1350-8 15.4µs ± 0% 15.7µs ± 1% +1.65% (p=0.000 n=10+10) Chacha20Poly1305/Seal-1350-8 15.6µs ± 2% 15.9µs ± 1% +1.58% (p=0.028 n=10+9) Chacha20Poly1305/Open-1350-X-8 16.0µs ± 1% 16.3µs ± 2% +2.00% (p=0.000 n=10+10) Chacha20Poly1305/Seal-1350-X-8 15.9µs ± 0% 16.3µs ± 1% +1.91% (p=0.000 n=10+8) Chacha20Poly1305/Open-8192-8 85.6µs ± 0% 86.6µs ± 1% +1.21% (p=0.000 n=10+10) Chacha20Poly1305/Seal-8192-8 85.7µs ± 0% 86.3µs ± 0% +0.68% (p=0.001 n=9+9) Chacha20Poly1305/Open-8192-X-8 86.4µs ± 1% 87.1µs ± 1% +0.76% (p=0.035 n=10+9) Chacha20Poly1305/Seal-8192-X-8 86.0µs ± 0% 87.0µs ± 1% +1.14% (p=0.000 n=9+9) Updates golang/go#24485 Change-Id: I2ec2ef487a03f013049915d9063751c75a78408b Reviewed-on: https://go-review.googlesource.com/c/crypto/+/185980 Reviewed-by: Michael Munday <mike.munday@ibm.com>
2019-07-12 19:29:02 +03:00
c, _ := NewUnauthenticatedCipher(key, nonce)
for i := 0; i < tot; i += step {
c.XORKeyStream(dst[i:], src[i:i+step])
}
internal/chacha20: implement the cipher.Stream interface and optimize SIMD implementations of ChaCha20 (such as CL 35842) interleave block computations in order to achieve high performance. This means that they produce more than 64 bytes of output at a time. Unfortunately when encrypting small amounts of data (such as Poly1305 keys) the current interface to ChaCha20 forces the additional encrypted blocks of output to be discarded and recomputed later since it does not maintain any state. This additional overhead slows down the encryption of small amounts of data when using such optimized code. This CL makes the generic ChaCha20 implementation stateful, caching key, nonce and counter values and buffering any unused key stream bytes. ChaCha20 now also implements the high level cipher.Stream interface which makes the API more consistent with other stream ciphers in the standard library's crypto package. This will make it easier to add high performance SIMD implementations in the future. In addition to modifying the API I have also added some optimizations to improve the performance of the generic implementation. Note that the performance will improve further on amd64 with Go 1.11 due to CL 95475 (binary.LittleEndian.PutUint32 optimization). These benchmarks are based on Go 1.10.1. name old speed new speed delta ChaCha20/32 174MB/s ± 2% 174MB/s ± 1% ~ (p=0.796 n=10+10) ChaCha20/63 309MB/s ± 1% 337MB/s ± 2% +9.32% (p=0.000 n=10+9) ChaCha20/64 299MB/s ± 2% 350MB/s ± 1% +17.12% (p=0.000 n=9+8) ChaCha20/256 297MB/s ± 2% 390MB/s ± 1% +31.40% (p=0.000 n=10+10) ChaCha20/1024 300MB/s ± 0% 400MB/s ± 3% +33.38% (p=0.000 n=7+10) ChaCha20/1350 290MB/s ± 1% 386MB/s ± 2% +33.10% (p=0.000 n=9+10) ChaCha20/65536 301MB/s ± 1% 416MB/s ± 2% +38.25% (p=0.000 n=9+10) ChaCha20-Poly1305 (AEAD optimizations manually disabled): name old speed new speed delta Chacha20Poly1305Open_64 122MB/s ± 7% 131MB/s ± 2% +7.23% (p=0.000 n=18+18) Chacha20Poly1305Seal_64 125MB/s ± 4% 137MB/s ± 2% +9.88% (p=0.000 n=20+19) Chacha20Poly1305Open_1350 244MB/s ± 4% 305MB/s ± 3% +25.04% (p=0.000 n=20+19) Chacha20Poly1305Seal_1350 242MB/s ± 3% 309MB/s ± 2% +27.56% (p=0.000 n=20+19) Chacha20Poly1305Open_8K 260MB/s ± 7% 338MB/s ± 3% +29.96% (p=0.000 n=20+19) Chacha20Poly1305Seal_8K 262MB/s ± 5% 335MB/s ± 4% +27.80% (p=0.000 n=20+19) No change in allocations for either set of benchmarks. Change-Id: I28ca7947904e9d79debe2d5aac6623526fe5e595 Reviewed-on: https://go-review.googlesource.com/104856 Run-TryBot: Michael Munday <mike.munday@ibm.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-18 16:32:01 +03:00
}
}
chacha20poly1305: add XChaCha20-Poly1305 The XChaCha20 construction does not have an authoritative spec, but this implementation is based on the following documents: https://cr.yp.to/snuffle/xsalsa-20081128.pdf https://download.libsodium.org/doc/secret-key_cryptography/aead.html http://loup-vaillant.fr/tutorials/chacha20-design https://tools.ietf.org/html/draft-paragon-paseto-rfc-00#section-7 Tested against the following implementations: https://github.com/jedisct1/libsodium/blob/7cdf3f0e841/test/default/aead_xchacha20poly1305.c https://git.kernel.org/pub/scm/linux/kernel/git/zx2c4/linux.git/diff/lib/zinc/selftest/chacha20poly1305.h?h=zinc https://git.zx2c4.com/wireguard-go/tree/xchacha20poly1305/xchacha20.go name time/op speed Chacha20Poly1305/Open-64-8 225ns ± 1% 283MB/s ± 1% Chacha20Poly1305/Open-64-X-8 390ns ± 0% 164MB/s ± 0% Chacha20Poly1305/Seal-64-8 222ns ± 0% 287MB/s ± 0% Chacha20Poly1305/Seal-64-X-8 386ns ± 0% 165MB/s ± 1% Chacha20Poly1305/Open-1350-8 1.12µs ± 1% 1.21GB/s ± 1% Chacha20Poly1305/Open-1350-X-8 1.28µs ± 0% 1.05GB/s ± 0% Chacha20Poly1305/Seal-1350-8 1.15µs ± 0% 1.17GB/s ± 0% Chacha20Poly1305/Seal-1350-X-8 1.32µs ± 1% 1.02GB/s ± 0% Chacha20Poly1305/Open-8192-8 5.53µs ± 0% 1.48GB/s ± 0% Chacha20Poly1305/Open-8192-X-8 5.71µs ± 1% 1.44GB/s ± 1% Chacha20Poly1305/Seal-8192-8 5.54µs ± 1% 1.48GB/s ± 1% Chacha20Poly1305/Seal-8192-X-8 5.74µs ± 1% 1.43GB/s ± 1% Updates golang/go#24485 Change-Id: Iea6f3b4c2be67f16f56720a200dcc895c0f9d520 Reviewed-on: https://go-review.googlesource.com/127819 Run-TryBot: Filippo Valsorda <filippo@golang.org> Reviewed-by: Adam Langley <agl@golang.org>
2018-08-04 00:55:05 +03:00
func BenchmarkChaCha20(b *testing.B) {
b.Run("64", func(b *testing.B) {
benchmarkChaCha20(b, 64, 1)
})
b.Run("256", func(b *testing.B) {
benchmarkChaCha20(b, 256, 1)
})
b.Run("10x25", func(b *testing.B) {
benchmarkChaCha20(b, 10, 25)
})
b.Run("4096", func(b *testing.B) {
benchmarkChaCha20(b, 4096, 1)
})
b.Run("100x40", func(b *testing.B) {
benchmarkChaCha20(b, 100, 40)
})
b.Run("65536", func(b *testing.B) {
benchmarkChaCha20(b, 65536, 1)
})
b.Run("1000x65", func(b *testing.B) {
benchmarkChaCha20(b, 1000, 65)
})
}
chacha20poly1305: add XChaCha20-Poly1305 The XChaCha20 construction does not have an authoritative spec, but this implementation is based on the following documents: https://cr.yp.to/snuffle/xsalsa-20081128.pdf https://download.libsodium.org/doc/secret-key_cryptography/aead.html http://loup-vaillant.fr/tutorials/chacha20-design https://tools.ietf.org/html/draft-paragon-paseto-rfc-00#section-7 Tested against the following implementations: https://github.com/jedisct1/libsodium/blob/7cdf3f0e841/test/default/aead_xchacha20poly1305.c https://git.kernel.org/pub/scm/linux/kernel/git/zx2c4/linux.git/diff/lib/zinc/selftest/chacha20poly1305.h?h=zinc https://git.zx2c4.com/wireguard-go/tree/xchacha20poly1305/xchacha20.go name time/op speed Chacha20Poly1305/Open-64-8 225ns ± 1% 283MB/s ± 1% Chacha20Poly1305/Open-64-X-8 390ns ± 0% 164MB/s ± 0% Chacha20Poly1305/Seal-64-8 222ns ± 0% 287MB/s ± 0% Chacha20Poly1305/Seal-64-X-8 386ns ± 0% 165MB/s ± 1% Chacha20Poly1305/Open-1350-8 1.12µs ± 1% 1.21GB/s ± 1% Chacha20Poly1305/Open-1350-X-8 1.28µs ± 0% 1.05GB/s ± 0% Chacha20Poly1305/Seal-1350-8 1.15µs ± 0% 1.17GB/s ± 0% Chacha20Poly1305/Seal-1350-X-8 1.32µs ± 1% 1.02GB/s ± 0% Chacha20Poly1305/Open-8192-8 5.53µs ± 0% 1.48GB/s ± 0% Chacha20Poly1305/Open-8192-X-8 5.71µs ± 1% 1.44GB/s ± 1% Chacha20Poly1305/Seal-8192-8 5.54µs ± 1% 1.48GB/s ± 1% Chacha20Poly1305/Seal-8192-X-8 5.74µs ± 1% 1.43GB/s ± 1% Updates golang/go#24485 Change-Id: Iea6f3b4c2be67f16f56720a200dcc895c0f9d520 Reviewed-on: https://go-review.googlesource.com/127819 Run-TryBot: Filippo Valsorda <filippo@golang.org> Reviewed-by: Adam Langley <agl@golang.org>
2018-08-04 00:55:05 +03:00
func TestHChaCha20(t *testing.T) {
chacha20: expose internal/chacha20 package const KeySize = 32 const NonceSize = 12 func HChaCha20(key, nonce []byte) ([]byte, error) type Cipher struct {} func NewUnauthenticatedCipher(key, nonce []byte) (*Cipher, error) func (s *Cipher) XORKeyStream(dst, src []byte) Small performance hit in chacha20poly1305, probably due to the loss of the Advance API, which we might consider adding later. No new allocations, thanks to the mid-stack inliner. name old time/op new time/op delta Chacha20Poly1305/Open-64-8 1.60µs ± 0% 1.68µs ± 1% +4.94% (p=0.000 n=9+10) Chacha20Poly1305/Seal-64-8 1.56µs ± 0% 1.64µs ± 1% +5.21% (p=0.000 n=8+10) Chacha20Poly1305/Open-64-X-8 2.10µs ± 1% 2.22µs ± 1% +5.81% (p=0.000 n=10+10) Chacha20Poly1305/Seal-64-X-8 2.07µs ± 1% 2.17µs ± 0% +4.88% (p=0.000 n=10+10) Chacha20Poly1305/Open-1350-8 15.4µs ± 0% 15.7µs ± 1% +1.65% (p=0.000 n=10+10) Chacha20Poly1305/Seal-1350-8 15.6µs ± 2% 15.9µs ± 1% +1.58% (p=0.028 n=10+9) Chacha20Poly1305/Open-1350-X-8 16.0µs ± 1% 16.3µs ± 2% +2.00% (p=0.000 n=10+10) Chacha20Poly1305/Seal-1350-X-8 15.9µs ± 0% 16.3µs ± 1% +1.91% (p=0.000 n=10+8) Chacha20Poly1305/Open-8192-8 85.6µs ± 0% 86.6µs ± 1% +1.21% (p=0.000 n=10+10) Chacha20Poly1305/Seal-8192-8 85.7µs ± 0% 86.3µs ± 0% +0.68% (p=0.001 n=9+9) Chacha20Poly1305/Open-8192-X-8 86.4µs ± 1% 87.1µs ± 1% +0.76% (p=0.035 n=10+9) Chacha20Poly1305/Seal-8192-X-8 86.0µs ± 0% 87.0µs ± 1% +1.14% (p=0.000 n=9+9) Updates golang/go#24485 Change-Id: I2ec2ef487a03f013049915d9063751c75a78408b Reviewed-on: https://go-review.googlesource.com/c/crypto/+/185980 Reviewed-by: Michael Munday <mike.munday@ibm.com>
2019-07-12 19:29:02 +03:00
// See draft-irtf-cfrg-xchacha-00, Section 2.2.1.
chacha20poly1305: add XChaCha20-Poly1305 The XChaCha20 construction does not have an authoritative spec, but this implementation is based on the following documents: https://cr.yp.to/snuffle/xsalsa-20081128.pdf https://download.libsodium.org/doc/secret-key_cryptography/aead.html http://loup-vaillant.fr/tutorials/chacha20-design https://tools.ietf.org/html/draft-paragon-paseto-rfc-00#section-7 Tested against the following implementations: https://github.com/jedisct1/libsodium/blob/7cdf3f0e841/test/default/aead_xchacha20poly1305.c https://git.kernel.org/pub/scm/linux/kernel/git/zx2c4/linux.git/diff/lib/zinc/selftest/chacha20poly1305.h?h=zinc https://git.zx2c4.com/wireguard-go/tree/xchacha20poly1305/xchacha20.go name time/op speed Chacha20Poly1305/Open-64-8 225ns ± 1% 283MB/s ± 1% Chacha20Poly1305/Open-64-X-8 390ns ± 0% 164MB/s ± 0% Chacha20Poly1305/Seal-64-8 222ns ± 0% 287MB/s ± 0% Chacha20Poly1305/Seal-64-X-8 386ns ± 0% 165MB/s ± 1% Chacha20Poly1305/Open-1350-8 1.12µs ± 1% 1.21GB/s ± 1% Chacha20Poly1305/Open-1350-X-8 1.28µs ± 0% 1.05GB/s ± 0% Chacha20Poly1305/Seal-1350-8 1.15µs ± 0% 1.17GB/s ± 0% Chacha20Poly1305/Seal-1350-X-8 1.32µs ± 1% 1.02GB/s ± 0% Chacha20Poly1305/Open-8192-8 5.53µs ± 0% 1.48GB/s ± 0% Chacha20Poly1305/Open-8192-X-8 5.71µs ± 1% 1.44GB/s ± 1% Chacha20Poly1305/Seal-8192-8 5.54µs ± 1% 1.48GB/s ± 1% Chacha20Poly1305/Seal-8192-X-8 5.74µs ± 1% 1.43GB/s ± 1% Updates golang/go#24485 Change-Id: Iea6f3b4c2be67f16f56720a200dcc895c0f9d520 Reviewed-on: https://go-review.googlesource.com/127819 Run-TryBot: Filippo Valsorda <filippo@golang.org> Reviewed-by: Adam Langley <agl@golang.org>
2018-08-04 00:55:05 +03:00
key := []byte{0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f}
nonce := []byte{0x00, 0x00, 0x00, 0x09, 0x00, 0x00, 0x00, 0x4a,
0x00, 0x00, 0x00, 0x00, 0x31, 0x41, 0x59, 0x27}
expected := []byte{0x82, 0x41, 0x3b, 0x42, 0x27, 0xb2, 0x7b, 0xfe,
0xd3, 0x0e, 0x42, 0x50, 0x8a, 0x87, 0x7d, 0x73,
0xa0, 0xf9, 0xe4, 0xd5, 0x8a, 0x74, 0xa8, 0x53,
0xc1, 0x2e, 0xc4, 0x13, 0x26, 0xd3, 0xec, 0xdc,
}
chacha20: expose internal/chacha20 package const KeySize = 32 const NonceSize = 12 func HChaCha20(key, nonce []byte) ([]byte, error) type Cipher struct {} func NewUnauthenticatedCipher(key, nonce []byte) (*Cipher, error) func (s *Cipher) XORKeyStream(dst, src []byte) Small performance hit in chacha20poly1305, probably due to the loss of the Advance API, which we might consider adding later. No new allocations, thanks to the mid-stack inliner. name old time/op new time/op delta Chacha20Poly1305/Open-64-8 1.60µs ± 0% 1.68µs ± 1% +4.94% (p=0.000 n=9+10) Chacha20Poly1305/Seal-64-8 1.56µs ± 0% 1.64µs ± 1% +5.21% (p=0.000 n=8+10) Chacha20Poly1305/Open-64-X-8 2.10µs ± 1% 2.22µs ± 1% +5.81% (p=0.000 n=10+10) Chacha20Poly1305/Seal-64-X-8 2.07µs ± 1% 2.17µs ± 0% +4.88% (p=0.000 n=10+10) Chacha20Poly1305/Open-1350-8 15.4µs ± 0% 15.7µs ± 1% +1.65% (p=0.000 n=10+10) Chacha20Poly1305/Seal-1350-8 15.6µs ± 2% 15.9µs ± 1% +1.58% (p=0.028 n=10+9) Chacha20Poly1305/Open-1350-X-8 16.0µs ± 1% 16.3µs ± 2% +2.00% (p=0.000 n=10+10) Chacha20Poly1305/Seal-1350-X-8 15.9µs ± 0% 16.3µs ± 1% +1.91% (p=0.000 n=10+8) Chacha20Poly1305/Open-8192-8 85.6µs ± 0% 86.6µs ± 1% +1.21% (p=0.000 n=10+10) Chacha20Poly1305/Seal-8192-8 85.7µs ± 0% 86.3µs ± 0% +0.68% (p=0.001 n=9+9) Chacha20Poly1305/Open-8192-X-8 86.4µs ± 1% 87.1µs ± 1% +0.76% (p=0.035 n=10+9) Chacha20Poly1305/Seal-8192-X-8 86.0µs ± 0% 87.0µs ± 1% +1.14% (p=0.000 n=9+9) Updates golang/go#24485 Change-Id: I2ec2ef487a03f013049915d9063751c75a78408b Reviewed-on: https://go-review.googlesource.com/c/crypto/+/185980 Reviewed-by: Michael Munday <mike.munday@ibm.com>
2019-07-12 19:29:02 +03:00
result, err := HChaCha20(key[:], nonce[:])
if err != nil {
t.Fatal(err)
}
if !bytes.Equal(expected, result) {
t.Errorf("want %x, got %x", expected, result)
chacha20poly1305: add XChaCha20-Poly1305 The XChaCha20 construction does not have an authoritative spec, but this implementation is based on the following documents: https://cr.yp.to/snuffle/xsalsa-20081128.pdf https://download.libsodium.org/doc/secret-key_cryptography/aead.html http://loup-vaillant.fr/tutorials/chacha20-design https://tools.ietf.org/html/draft-paragon-paseto-rfc-00#section-7 Tested against the following implementations: https://github.com/jedisct1/libsodium/blob/7cdf3f0e841/test/default/aead_xchacha20poly1305.c https://git.kernel.org/pub/scm/linux/kernel/git/zx2c4/linux.git/diff/lib/zinc/selftest/chacha20poly1305.h?h=zinc https://git.zx2c4.com/wireguard-go/tree/xchacha20poly1305/xchacha20.go name time/op speed Chacha20Poly1305/Open-64-8 225ns ± 1% 283MB/s ± 1% Chacha20Poly1305/Open-64-X-8 390ns ± 0% 164MB/s ± 0% Chacha20Poly1305/Seal-64-8 222ns ± 0% 287MB/s ± 0% Chacha20Poly1305/Seal-64-X-8 386ns ± 0% 165MB/s ± 1% Chacha20Poly1305/Open-1350-8 1.12µs ± 1% 1.21GB/s ± 1% Chacha20Poly1305/Open-1350-X-8 1.28µs ± 0% 1.05GB/s ± 0% Chacha20Poly1305/Seal-1350-8 1.15µs ± 0% 1.17GB/s ± 0% Chacha20Poly1305/Seal-1350-X-8 1.32µs ± 1% 1.02GB/s ± 0% Chacha20Poly1305/Open-8192-8 5.53µs ± 0% 1.48GB/s ± 0% Chacha20Poly1305/Open-8192-X-8 5.71µs ± 1% 1.44GB/s ± 1% Chacha20Poly1305/Seal-8192-8 5.54µs ± 1% 1.48GB/s ± 1% Chacha20Poly1305/Seal-8192-X-8 5.74µs ± 1% 1.43GB/s ± 1% Updates golang/go#24485 Change-Id: Iea6f3b4c2be67f16f56720a200dcc895c0f9d520 Reviewed-on: https://go-review.googlesource.com/127819 Run-TryBot: Filippo Valsorda <filippo@golang.org> Reviewed-by: Adam Langley <agl@golang.org>
2018-08-04 00:55:05 +03:00
}
}