Граф коммитов

15 Коммитов

Автор SHA1 Сообщение Дата
Nigel Tao ebebc71721 Raise the "always encode as literal" size threshold from 4 to 14.
This isn't an optimization per se, although it does trade off the
"encode 10 bytes" benchmark to favor speed over output size. The point
of this commit is to move closer to what the C++ snappy code does.

benchmark                     old MB/s     new MB/s     speedup
BenchmarkWordsEncode1e1-8     5.77         674.93       116.97x
BenchmarkWordsEncode1e2-8     47.96        47.92        1.00x
BenchmarkWordsEncode1e3-8     190.33       189.48       1.00x
BenchmarkWordsEncode1e4-8     190.25       193.17       1.02x
BenchmarkWordsEncode1e5-8     150.65       151.44       1.01x
BenchmarkWordsEncode1e6-8     180.11       180.63       1.00x
BenchmarkRandomEncode-8       4782.70      4700.25      0.98x
Benchmark_ZFlat0-8            372.49       372.12       1.00x
Benchmark_ZFlat1-8            186.49       187.62       1.01x
Benchmark_ZFlat2-8            4979.47      4891.26      0.98x
Benchmark_ZFlat3-8            85.76        86.16        1.00x
Benchmark_ZFlat4-8            566.31       570.31       1.01x
Benchmark_ZFlat5-8            366.01       366.84       1.00x
Benchmark_ZFlat6-8            162.13       164.18       1.01x
Benchmark_ZFlat7-8            153.69       155.23       1.01x
Benchmark_ZFlat8-8            167.91       169.62       1.01x
Benchmark_ZFlat9-8            147.71       149.43       1.01x
Benchmark_ZFlat10-8           414.06       412.63       1.00x
Benchmark_ZFlat11-8           248.87       247.98       1.00x
2016-04-03 09:25:18 +10:00
Nigel Tao 7ede8d1b13 Eliminate some bounds checks in the encoder.
As per
https://groups.google.com/d/msg/golang-dev/jVP6h21OyL8/Syhfot9XBQAJ,
recent versions of the gc compiler can optimize:

func load32(b []byte, i int32) uint32 {
  b = b[i : i+4 : len(b)]
  return uint32(b[0]) | etc | uint32(b[3])<<24
}

benchmark                     old MB/s     new MB/s     speedup
BenchmarkWordsEncode1e1-8     5.78         5.77         1.00x
BenchmarkWordsEncode1e2-8     47.22        47.96        1.02x
BenchmarkWordsEncode1e3-8     183.53       190.33       1.04x
BenchmarkWordsEncode1e4-8     198.95       190.25       0.96x
BenchmarkWordsEncode1e5-8     144.60       150.65       1.04x
BenchmarkWordsEncode1e6-8     172.11       180.11       1.05x
BenchmarkRandomEncode-8       4547.98      4782.70      1.05x
Benchmark_ZFlat0-8            359.18       372.49       1.04x
Benchmark_ZFlat1-8            181.57       186.49       1.03x
Benchmark_ZFlat2-8            4566.75      4979.47      1.09x
Benchmark_ZFlat3-8            86.00        85.76        1.00x
Benchmark_ZFlat4-8            558.08       566.31       1.01x
Benchmark_ZFlat5-8            354.18       366.01       1.03x
Benchmark_ZFlat6-8            156.20       162.13       1.04x
Benchmark_ZFlat7-8            147.76       153.69       1.04x
Benchmark_ZFlat8-8            162.49       167.91       1.03x
Benchmark_ZFlat9-8            142.33       147.71       1.04x
Benchmark_ZFlat10-8           401.93       414.06       1.03x
Benchmark_ZFlat11-8           235.94       248.87       1.05x
2016-04-03 08:30:29 +10:00
Nigel Tao d1f56d2222 Encode copies of length 65, 66 or 67 as 5 bytes, not 6.
The benchmarks don't show a big change either way, but the output is
shorter, and it matches what the C++ snappy code does.

benchmark                     old MB/s     new MB/s     speedup
BenchmarkWordsEncode1e1-8     5.77         5.77         1.00x
BenchmarkWordsEncode1e2-8     47.15        47.26        1.00x
BenchmarkWordsEncode1e3-8     180.77       183.25       1.01x
BenchmarkWordsEncode1e4-8     202.01       198.96       0.98x
BenchmarkWordsEncode1e5-8     145.66       144.68       0.99x
BenchmarkWordsEncode1e6-8     174.12       172.31       0.99x
BenchmarkRandomEncode-8       4522.91      4495.78      0.99x
Benchmark_ZFlat0-8            359.70       359.79       1.00x
Benchmark_ZFlat1-8            181.18       181.82       1.00x
Benchmark_ZFlat2-8            4612.52      4557.46      0.99x
Benchmark_ZFlat3-8            85.65        84.82        0.99x
Benchmark_ZFlat4-8            559.51       558.52       1.00x
Benchmark_ZFlat5-8            354.88       352.91       0.99x
Benchmark_ZFlat6-8            156.14       156.26       1.00x
Benchmark_ZFlat7-8            148.18       148.12       1.00x
Benchmark_ZFlat8-8            162.68       162.21       1.00x
Benchmark_ZFlat9-8            141.81       142.32       1.00x
Benchmark_ZFlat10-8           399.79       401.94       1.01x
Benchmark_ZFlat11-8           237.43       235.91       0.99x
2016-04-02 16:22:40 +11:00
Nigel Tao 624b11c0e0 Fix some comment styles. 2016-02-25 15:30:12 +11:00
Nigel Tao bf2ded9d81 Use 64K blocks when encoding long inputs.
This enables future optimizations, such as an encoder's hash table entry being
uint16 instead of int32.
2016-02-22 12:44:36 +11:00
Nigel Tao d1d908a252 Fix heuristic match skipping.
The heuristic was introduced in 4e2aa98e, based on the C++ Snappy
implementation, but the Go code contained a flawed optimization. The C++ code
used an explicit skip variable:

  uint32 bytes_between_hash_lookups = skip++ >> 5;
  next_ip = ip + bytes_between_hash_lookups;

whereas the Go code optimized this to be an implicit skip:

  s += 1 + (s-lit)>>5

This is equivalent for small s values (relative to lit, the last hash table
hit), but diverges for large ones. This Go program demonstrates the difference:

// main prints the encoder skipping behavior when seeing no hash table hits.
func main() {
  s0, s1 := 0, 0
  skip := 32
  for i := 0; i < 300; i++ {
    // This is the C++ Snappy algorithm.
    bytes_between_hash_lookups := skip >> 5
    skip++
    s0 += bytes_between_hash_lookups

    // This is the Go Snappy algorithm.
    s1 += 1 + s1>>5

    // The intention was that the Go algorithm behaves the same as the C++
    // one, but it doesn't.
    if i%10 == 0 {
      fmt.Printf("%d\t%d\t%d\n", i, s0, s1)
    }
  }
}

It prints:

0	1	1
10	11	11
20	21	21
30	31	31
40	50	50
50	70	73
60	90	105
70	117	149
80	147	208
90	177	288
100	212	398
110	252	548
120	292	752
130	335	1030
140	385	1408
150	435	1922
160	486	2619
170	546	3568
180	606	4861
190	666	6617
200	735	9005
210	805	12257
220	875	16681
230	952	22697
240	1032	30881
250	1112	42015
260	1197	57161
270	1287	77764
280	1377	105791
290	1470	143914

The C++ algorithm is quadratic. The Go algorithm is exponential.

This commit re-introduces the explicit skip variable, so that the Go
implementation matches the C++ implementation.

For completeness, benchmark numbers are included below, but the worse numbers
merely reflect that the old Go algorithm was too aggressive on skipping ahead
on incompressible input (RandomEncode, ZFlat2 and ZFlat4), and so after an
initial warm-up period, it was essentially performing not much more than a
memcpy. Memcpy is indeed fast in terms of MB/s, but it doesn't compress at all,
which obviously defeats the whole purpose of a compression format like Snappy.

benchmark                     old MB/s     new MB/s     speedup
BenchmarkWordsEncode1e1-4     3.65         3.77         1.03x
BenchmarkWordsEncode1e2-4     29.22        29.35        1.00x
BenchmarkWordsEncode1e3-4     99.46        101.20       1.02x
BenchmarkWordsEncode1e4-4     118.11       121.54       1.03x
BenchmarkWordsEncode1e5-4     90.37        91.72        1.01x
BenchmarkWordsEncode1e6-4     107.49       108.88       1.01x
BenchmarkRandomEncode-4       7679.09      4491.97      0.58x
Benchmark_ZFlat0-4            229.41       233.79       1.02x
Benchmark_ZFlat1-4            115.10       116.83       1.02x
Benchmark_ZFlat2-4            7256.88      3003.79      0.41x
Benchmark_ZFlat3-4            53.39        54.02        1.01x
Benchmark_ZFlat4-4            1873.63      289.28       0.15x
Benchmark_ZFlat5-4            233.29       234.95       1.01x
Benchmark_ZFlat6-4            101.33       102.79       1.01x
Benchmark_ZFlat7-4            95.26        96.63        1.01x
Benchmark_ZFlat8-4            105.66       106.89       1.01x
Benchmark_ZFlat9-4            92.04        93.11        1.01x
Benchmark_ZFlat10-4           265.68       265.93       1.00x
Benchmark_ZFlat11-4           149.72       151.32       1.01x

These numbers were generated on an amd64 machine, but on a different machine
than the one used for other recent commits. The raw MB/s numbers are therefore
not directly comparable, although the speedup numbers should be.
2016-02-14 16:54:35 +11:00
Nigel Tao c2359a1bd0 Catch MaxEncodedLen overflow. 2016-02-13 20:09:53 +11:00
Nigel Tao cc71ae7cc5 Change the encoder's hash table values from int to int32.
Doing s/int/int32/ in "var table [maxTableSize]int" saves 64 KiB of
stack space that needed zero'ing. maxTableSize is 1<<14, or 16384.

The benchmarks show the biggest effect for small src lengths, or for
mostly uncompressible data such as the JPEG file (possibly because the
multiple-byte skipping means that the src is effectively short).

On amd64:

benchmark                     old MB/s     new MB/s     speedup
BenchmarkWordsEncode1e1-8     3.05         5.71         1.87x
BenchmarkWordsEncode1e2-8     26.98        44.87        1.66x
BenchmarkWordsEncode1e3-8     130.87       156.72       1.20x
BenchmarkWordsEncode1e4-8     162.48       180.89       1.11x
BenchmarkWordsEncode1e5-8     132.35       131.27       0.99x
BenchmarkWordsEncode1e6-8     159.97       158.49       0.99x
BenchmarkRandomEncode-8       12340.86     13485.69     1.09x
Benchmark_ZFlat0-8            329.92       329.17       1.00x
Benchmark_ZFlat1-8            165.06       164.46       1.00x
Benchmark_ZFlat2-8            8955.25      10530.49     1.18x
Benchmark_ZFlat3-8            47.79        80.06        1.68x
Benchmark_ZFlat4-8            2650.55      2732.00      1.03x
Benchmark_ZFlat5-8            336.52       334.94       1.00x
Benchmark_ZFlat6-8            147.99       145.85       0.99x
Benchmark_ZFlat7-8            136.32       137.20       1.01x
Benchmark_ZFlat8-8            153.03       152.15       0.99x
Benchmark_ZFlat9-8            133.18       131.74       0.99x
Benchmark_ZFlat10-8           376.02       378.28       1.01x
Benchmark_ZFlat11-8           224.16       216.81       0.97x

Thanks to Klaus Post for the original suggestion on
https://github.com/golang/snappy/pull/23 but I hesitate to accept that
pull request in its entirety as it makes many changes, some more
complicated than this separable, self-contained s/int/int32/ change.
2016-02-13 14:11:38 +11:00
Nigel Tao 07070fd417 Catch overflow when incrementing src pointers. 2016-02-12 16:49:07 +11:00
Nigel Tao 799c780093 Reduce the number of Write calls to the underlying io.Writer. 2016-02-11 17:10:40 +11:00
Nigel Tao 0fd139378b Add NewBufferedWriter, and Flush and Close methods.
Deprecate NewWriter.

See the discussion on
https://groups.google.com/d/topic/golang-dev/nXp12KmMSvM/discussion
2016-02-11 15:44:51 +11:00
Nigel Tao 4e2aa98ebb Skip multiple bytes if the last match was >= 32 bytes prior.
benchmark                     old MB/s     new MB/s     speedup
BenchmarkWordsEncode1e3-8     137.99       132.57       0.96x
BenchmarkWordsEncode1e4-8     173.30       156.26       0.90x
BenchmarkWordsEncode1e5-8     137.16       132.59       0.97x
BenchmarkWordsEncode1e6-8     165.45       164.47       0.99x
BenchmarkRandomEncode-8       140.04       12260.44     87.55x
Benchmark_ZFlat0-8            334.14       335.84       1.01x
Benchmark_ZFlat1-8            168.93       168.19       1.00x
Benchmark_ZFlat2-8            134.42       8763.96      65.20x
Benchmark_ZFlat3-8            48.04        47.36        0.99x
Benchmark_ZFlat4-8            151.86       2578.12      16.98x
Benchmark_ZFlat5-8            344.43       341.94       0.99x
Benchmark_ZFlat6-8            149.21       147.24       0.99x
Benchmark_ZFlat7-8            140.87       138.72       0.98x
Benchmark_ZFlat8-8            155.95       155.89       1.00x
Benchmark_ZFlat9-8            135.05       136.07       1.01x
Benchmark_ZFlat10-8           380.98       379.77       1.00x
Benchmark_ZFlat11-8           227.48       226.59       1.00x

Thanks to Klaus Post for the original suggestion. Unfortunately,
https://github.com/golang/snappy/pull/19 was abandoned.
2016-02-10 14:31:44 +11:00
Damian Gryski ec7b924342 C++ snappy has moved to github 2015-07-21 09:14:30 +02:00
Nigel Tao 2a6d64140d Have Encode return []byte instead of ([]byte, error).
Encoding can never fail, and returning an error is inconsistent with the
standard library's encoding/{ascii85,hex,pem} packages.

Fixes #8.
2015-07-17 17:21:07 +10:00
Sebastien Binet c5eccb269a all: simpler import path 2015-06-11 10:18:30 +02:00