Nigel Tao
ff6b7dc882
Add comments re handling block and stream formats
2019-09-04 16:35:34 +10:00
fatedier
0d9c4c05f1
fix typo
2017-01-25 15:07:54 +08:00
Nigel Tao
988ce01844
Add a fast path for short emitLiteral calls.
...
Compared to the previous commit:
name old speed new speed delta
WordsEncode1e1-8 667MB/s ± 0% 677MB/s ± 1% +1.57% (p=0.008 n=5+5)
WordsEncode1e2-8 353MB/s ± 1% 428MB/s ± 0% +21.37% (p=0.008 n=5+5)
WordsEncode1e3-8 383MB/s ± 1% 446MB/s ± 1% +16.65% (p=0.008 n=5+5)
WordsEncode1e4-8 277MB/s ± 1% 316MB/s ± 0% +13.93% (p=0.008 n=5+5)
WordsEncode1e5-8 248MB/s ± 0% 269MB/s ± 0% +8.57% (p=0.008 n=5+5)
WordsEncode1e6-8 296MB/s ± 0% 314MB/s ± 1% +6.08% (p=0.008 n=5+5)
RandomEncode-8 14.4GB/s ± 2% 14.4GB/s ± 1% ~ (p=1.000 n=5+5)
_ZFlat0-8 748MB/s ± 0% 792MB/s ± 0% +5.87% (p=0.008 n=5+5)
_ZFlat1-8 406MB/s ± 0% 436MB/s ± 1% +7.42% (p=0.008 n=5+5)
_ZFlat2-8 16.1GB/s ± 1% 16.2GB/s ± 1% ~ (p=0.421 n=5+5)
_ZFlat3-8 604MB/s ± 0% 632MB/s ± 1% +4.49% (p=0.008 n=5+5)
_ZFlat4-8 7.62GB/s ± 1% 8.00GB/s ± 0% +5.03% (p=0.008 n=5+5)
_ZFlat5-8 729MB/s ± 0% 768MB/s ± 0% +5.26% (p=0.008 n=5+5)
_ZFlat6-8 267MB/s ± 0% 282MB/s ± 1% +5.92% (p=0.008 n=5+5)
_ZFlat7-8 248MB/s ± 0% 264MB/s ± 1% +6.48% (p=0.008 n=5+5)
_ZFlat8-8 282MB/s ± 0% 298MB/s ± 0% +5.87% (p=0.008 n=5+5)
_ZFlat9-8 231MB/s ± 0% 247MB/s ± 0% +6.79% (p=0.008 n=5+5)
_ZFlat10-8 972MB/s ± 0% 1027MB/s ± 0% +5.64% (p=0.008 n=5+5)
_ZFlat11-8 401MB/s ± 0% 411MB/s ± 0% +2.43% (p=0.008 n=5+5)
The net effect of the past three commits, when compared to just before
68801229
"Write the encoder's encodeBlock in asm":
name old speed new speed delta
WordsEncode1e1-8 665MB/s ± 0% 677MB/s ± 1% +1.80% (p=0.016 n=4+5)
WordsEncode1e2-8 85.0MB/s ± 0% 428.3MB/s ± 0% +403.65% (p=0.016 n=4+5)
WordsEncode1e3-8 234MB/s ± 2% 446MB/s ± 1% +90.90% (p=0.008 n=5+5)
WordsEncode1e4-8 233MB/s ± 0% 316MB/s ± 0% +35.22% (p=0.008 n=5+5)
WordsEncode1e5-8 214MB/s ± 1% 269MB/s ± 0% +25.45% (p=0.008 n=5+5)
WordsEncode1e6-8 258MB/s ± 0% 314MB/s ± 1% +21.82% (p=0.008 n=5+5)
RandomEncode-8 13.1GB/s ± 1% 14.4GB/s ± 1% +10.31% (p=0.008 n=5+5)
_ZFlat0-8 630MB/s ± 0% 792MB/s ± 0% +25.71% (p=0.016 n=4+5)
_ZFlat1-8 326MB/s ± 0% 436MB/s ± 1% +33.89% (p=0.016 n=4+5)
_ZFlat2-8 13.9GB/s ± 1% 16.2GB/s ± 1% +16.27% (p=0.008 n=5+5)
_ZFlat3-8 177MB/s ± 1% 632MB/s ± 1% +257.58% (p=0.008 n=5+5)
_ZFlat4-8 6.19GB/s ± 1% 8.00GB/s ± 0% +29.32% (p=0.008 n=5+5)
_ZFlat5-8 615MB/s ± 0% 768MB/s ± 0% +24.91% (p=0.008 n=5+5)
_ZFlat6-8 231MB/s ± 0% 282MB/s ± 1% +21.95% (p=0.008 n=5+5)
_ZFlat7-8 215MB/s ± 1% 264MB/s ± 1% +22.83% (p=0.008 n=5+5)
_ZFlat8-8 246MB/s ± 0% 298MB/s ± 0% +21.46% (p=0.008 n=5+5)
_ZFlat9-8 202MB/s ± 0% 247MB/s ± 0% +22.17% (p=0.008 n=5+5)
_ZFlat10-8 803MB/s ± 0% 1027MB/s ± 0% +27.93% (p=0.008 n=5+5)
_ZFlat11-8 351MB/s ± 0% 411MB/s ± 0% +16.92% (p=0.008 n=5+5)
2016-04-23 14:49:49 +10:00
Nigel Tao
6880122951
Write the encoder's encodeBlock in asm.
...
name old speed new speed delta
WordsEncode1e1-8 665MB/s ± 0% 678MB/s ± 0% +2.00% (p=0.016 n=4+5)
WordsEncode1e2-8 85.0MB/s ± 0% 90.1MB/s ± 0% +5.90% (p=0.016 n=4+5)
WordsEncode1e3-8 234MB/s ± 2% 295MB/s ± 0% +26.20% (p=0.008 n=5+5)
WordsEncode1e4-8 233MB/s ± 0% 276MB/s ± 0% +18.31% (p=0.008 n=5+5)
WordsEncode1e5-8 214MB/s ± 1% 248MB/s ± 0% +15.52% (p=0.008 n=5+5)
WordsEncode1e6-8 258MB/s ± 0% 295MB/s ± 0% +14.62% (p=0.008 n=5+5)
RandomEncode-8 13.1GB/s ± 1% 14.4GB/s ± 1% +10.27% (p=0.008 n=5+5)
_ZFlat0-8 630MB/s ± 0% 749MB/s ± 0% +18.96% (p=0.016 n=4+5)
_ZFlat1-8 326MB/s ± 0% 405MB/s ± 0% +24.41% (p=0.029 n=4+4)
_ZFlat2-8 13.9GB/s ± 1% 16.2GB/s ± 1% +16.04% (p=0.008 n=5+5)
_ZFlat3-8 177MB/s ± 1% 202MB/s ± 1% +14.51% (p=0.008 n=5+5)
_ZFlat4-8 6.19GB/s ± 1% 7.59GB/s ± 1% +22.64% (p=0.008 n=5+5)
_ZFlat5-8 615MB/s ± 0% 728MB/s ± 1% +18.45% (p=0.008 n=5+5)
_ZFlat6-8 231MB/s ± 0% 266MB/s ± 1% +15.00% (p=0.008 n=5+5)
_ZFlat7-8 215MB/s ± 1% 248MB/s ± 0% +15.30% (p=0.008 n=5+5)
_ZFlat8-8 246MB/s ± 0% 282MB/s ± 0% +14.73% (p=0.016 n=5+4)
_ZFlat9-8 202MB/s ± 0% 231MB/s ± 0% +14.13% (p=0.008 n=5+5)
_ZFlat10-8 803MB/s ± 0% 970MB/s ± 0% +20.90% (p=0.008 n=5+5)
_ZFlat11-8 351MB/s ± 0% 402MB/s ± 0% +14.29% (p=0.008 n=5+5)
2016-04-23 14:21:06 +10:00
Nigel Tao
17e435849f
Restrict the scope of the tableSize variable.
...
It's really just a style change, not an optimization, but for the
record, the numbers don't show a strong change either way, and could
easily just be noise.
name old speed new speed delta
WordsEncode1e1-8 667MB/s ± 0% 665MB/s ± 0% ~ (p=0.190 n=5+4)
WordsEncode1e2-8 85.1MB/s ± 0% 85.0MB/s ± 0% ~ (p=0.556 n=5+4)
WordsEncode1e3-8 235MB/s ± 0% 234MB/s ± 2% ~ (p=0.690 n=5+5)
WordsEncode1e4-8 234MB/s ± 0% 233MB/s ± 0% ~ (p=0.151 n=5+5)
WordsEncode1e5-8 216MB/s ± 0% 214MB/s ± 1% -0.61% (p=0.008 n=5+5)
WordsEncode1e6-8 258MB/s ± 0% 258MB/s ± 0% -0.29% (p=0.024 n=5+5)
RandomEncode-8 13.2GB/s ± 1% 13.1GB/s ± 1% ~ (p=0.056 n=5+5)
_ZFlat0-8 629MB/s ± 0% 630MB/s ± 0% ~ (p=0.111 n=5+4)
_ZFlat1-8 325MB/s ± 0% 326MB/s ± 0% +0.27% (p=0.016 n=5+4)
_ZFlat2-8 13.7GB/s ± 5% 13.9GB/s ± 1% ~ (p=0.310 n=5+5)
_ZFlat3-8 177MB/s ± 0% 177MB/s ± 1% ~ (p=0.690 n=5+5)
_ZFlat4-8 6.15GB/s ± 2% 6.19GB/s ± 1% ~ (p=0.222 n=5+5)
_ZFlat5-8 614MB/s ± 0% 615MB/s ± 0% ~ (p=0.310 n=5+5)
_ZFlat6-8 231MB/s ± 2% 231MB/s ± 0% ~ (p=0.690 n=5+5)
_ZFlat7-8 215MB/s ± 2% 215MB/s ± 1% ~ (p=0.222 n=5+5)
_ZFlat8-8 246MB/s ± 0% 246MB/s ± 0% ~ (p=0.190 n=4+5)
_ZFlat9-8 202MB/s ± 0% 202MB/s ± 0% ~ (p=0.683 n=4+5)
_ZFlat10-8 794MB/s ± 2% 803MB/s ± 0% +1.13% (p=0.008 n=5+5)
_ZFlat11-8 350MB/s ± 0% 351MB/s ± 0% +0.25% (p=0.032 n=4+5)
2016-04-23 12:28:42 +10:00
Nigel Tao
0c43e98dfe
Add comment that dst and src must not overlap.
2016-04-23 12:17:41 +10:00
Nigel Tao
62bb72da9a
Write the encoder's emitLiteral in asm.
...
name old speed new speed delta
WordsEncode1e1-8 665MB/s ± 0% 667MB/s ± 0% +0.35% (p=0.008 n=5+5)
WordsEncode1e2-8 83.8MB/s ± 1% 85.1MB/s ± 0% +1.47% (p=0.008 n=5+5)
WordsEncode1e3-8 231MB/s ± 1% 235MB/s ± 0% +1.81% (p=0.008 n=5+5)
WordsEncode1e4-8 232MB/s ± 1% 234MB/s ± 0% +0.78% (p=0.016 n=5+5)
WordsEncode1e5-8 212MB/s ± 1% 216MB/s ± 0% +1.55% (p=0.008 n=5+5)
WordsEncode1e6-8 257MB/s ± 0% 258MB/s ± 0% +0.68% (p=0.008 n=5+5)
RandomEncode-8 13.2GB/s ± 1% 13.2GB/s ± 1% ~ (p=0.548 n=5+5)
_ZFlat0-8 629MB/s ± 0% 629MB/s ± 0% ~ (p=0.690 n=5+5)
_ZFlat1-8 324MB/s ± 0% 325MB/s ± 0% ~ (p=0.222 n=5+5)
_ZFlat2-8 13.9GB/s ± 1% 13.7GB/s ± 5% ~ (p=0.310 n=5+5)
_ZFlat3-8 176MB/s ± 1% 177MB/s ± 0% ~ (p=0.548 n=5+5)
_ZFlat4-8 6.12GB/s ± 0% 6.15GB/s ± 2% ~ (p=0.151 n=5+5)
_ZFlat5-8 614MB/s ± 0% 614MB/s ± 0% ~ (p=0.548 n=5+5)
_ZFlat6-8 230MB/s ± 0% 231MB/s ± 2% ~ (p=0.151 n=5+5)
_ZFlat7-8 214MB/s ± 0% 215MB/s ± 2% ~ (p=0.151 n=5+5)
_ZFlat8-8 244MB/s ± 0% 246MB/s ± 0% +0.71% (p=0.016 n=5+4)
_ZFlat9-8 200MB/s ± 0% 202MB/s ± 0% +0.95% (p=0.016 n=5+4)
_ZFlat10-8 797MB/s ± 0% 794MB/s ± 2% ~ (p=1.000 n=5+5)
_ZFlat11-8 351MB/s ± 1% 350MB/s ± 0% ~ (p=0.730 n=5+4)
2016-04-23 12:05:56 +10:00
Nigel Tao
d8211ff0ee
Write the encoder's emitCopy in asm.
...
name old speed new speed delta
WordsEncode1e1-8 690MB/s ± 0% 665MB/s ± 0% -3.64% (p=0.008 n=5+5)
WordsEncode1e2-8 83.7MB/s ± 1% 83.8MB/s ± 1% ~ (p=0.421 n=5+5)
WordsEncode1e3-8 230MB/s ± 1% 231MB/s ± 1% ~ (p=0.421 n=5+5)
WordsEncode1e4-8 233MB/s ± 1% 232MB/s ± 1% ~ (p=0.151 n=5+5)
WordsEncode1e5-8 212MB/s ± 0% 212MB/s ± 1% ~ (p=1.000 n=5+5)
WordsEncode1e6-8 255MB/s ± 0% 257MB/s ± 0% +0.57% (p=0.008 n=5+5)
RandomEncode-8 13.2GB/s ± 1% 13.2GB/s ± 1% ~ (p=0.151 n=5+5)
_ZFlat0-8 623MB/s ± 0% 629MB/s ± 0% +0.93% (p=0.008 n=5+5)
_ZFlat1-8 319MB/s ± 1% 324MB/s ± 0% +1.65% (p=0.008 n=5+5)
_ZFlat2-8 13.9GB/s ± 1% 13.9GB/s ± 1% ~ (p=0.548 n=5+5)
_ZFlat3-8 176MB/s ± 0% 176MB/s ± 1% ~ (p=0.690 n=5+5)
_ZFlat4-8 6.05GB/s ± 0% 6.12GB/s ± 0% +1.20% (p=0.008 n=5+5)
_ZFlat5-8 603MB/s ± 0% 614MB/s ± 0% +1.71% (p=0.008 n=5+5)
_ZFlat6-8 228MB/s ± 0% 230MB/s ± 0% +0.83% (p=0.008 n=5+5)
_ZFlat7-8 212MB/s ± 0% 214MB/s ± 0% +0.74% (p=0.008 n=5+5)
_ZFlat8-8 242MB/s ± 0% 244MB/s ± 0% +0.99% (p=0.008 n=5+5)
_ZFlat9-8 199MB/s ± 1% 200MB/s ± 0% +0.57% (p=0.008 n=5+5)
_ZFlat10-8 796MB/s ± 1% 797MB/s ± 0% ~ (p=1.000 n=5+5)
_ZFlat11-8 348MB/s ± 0% 351MB/s ± 1% ~ (p=0.056 n=5+5)
I'm not overly worried about the WordsEncode1e1-8 change: the time/op is
around 15 nanoseconds, which is tiny. In comparison, _ZFlat0-8 takes
around 163 microseconds (note µs not ns).
2016-04-23 11:48:21 +10:00
Nigel Tao
4f2f9a13dd
Write the encoder's extendMatch in asm.
...
name old speed new speed delta
WordsEncode1e1-8 678MB/s ± 0% 690MB/s ± 0% +1.79% (p=0.008 n=5+5)
WordsEncode1e2-8 87.5MB/s ± 0% 83.7MB/s ± 1% -4.26% (p=0.008 n=5+5)
WordsEncode1e3-8 257MB/s ± 1% 230MB/s ± 1% -10.41% (p=0.008 n=5+5)
WordsEncode1e4-8 247MB/s ± 1% 233MB/s ± 1% -5.56% (p=0.008 n=5+5)
WordsEncode1e5-8 186MB/s ± 0% 212MB/s ± 0% +14.36% (p=0.008 n=5+5)
WordsEncode1e6-8 211MB/s ± 0% 255MB/s ± 0% +20.82% (p=0.008 n=5+5)
RandomEncode-8 13.1GB/s ± 2% 13.2GB/s ± 1% ~ (p=0.222 n=5+5)
_ZFlat0-8 433MB/s ± 0% 623MB/s ± 0% +43.92% (p=0.008 n=5+5)
_ZFlat1-8 276MB/s ± 0% 319MB/s ± 1% +15.42% (p=0.008 n=5+5)
_ZFlat2-8 13.8GB/s ± 1% 13.9GB/s ± 1% ~ (p=0.222 n=5+5)
_ZFlat3-8 170MB/s ± 0% 176MB/s ± 0% +3.55% (p=0.008 n=5+5)
_ZFlat4-8 3.09GB/s ± 1% 6.05GB/s ± 0% +96.00% (p=0.008 n=5+5)
_ZFlat5-8 427MB/s ± 1% 603MB/s ± 0% +41.35% (p=0.008 n=5+5)
_ZFlat6-8 190MB/s ± 0% 228MB/s ± 0% +20.24% (p=0.008 n=5+5)
_ZFlat7-8 182MB/s ± 0% 212MB/s ± 0% +16.87% (p=0.008 n=5+5)
_ZFlat8-8 200MB/s ± 0% 242MB/s ± 0% +20.97% (p=0.008 n=5+5)
_ZFlat9-8 175MB/s ± 0% 199MB/s ± 1% +13.74% (p=0.008 n=5+5)
_ZFlat10-8 507MB/s ± 0% 796MB/s ± 1% +56.83% (p=0.008 n=5+5)
_ZFlat11-8 278MB/s ± 0% 348MB/s ± 0% +25.09% (p=0.008 n=5+5)
name old time/op new time/op delta
ExtendMatch-8 16.5µs ± 1% 7.8µs ± 1% -52.93% (p=0.008 n=5+5)
2016-04-23 11:26:04 +10:00
Nigel Tao
1f4d362d6d
Clarify the emitLiteral and emitCopy preconditions.
...
This allows deleting some redundant code.
name old speed new speed delta
WordsEncode1e1-8 679MB/s ± 0% 678MB/s ± 0% ~ (p=0.087 n=5+5)
WordsEncode1e2-8 87.5MB/s ± 0% 87.5MB/s ± 0% ~ (p=0.579 n=5+5)
WordsEncode1e3-8 258MB/s ± 0% 257MB/s ± 1% ~ (p=1.000 n=5+5)
WordsEncode1e4-8 243MB/s ± 0% 247MB/s ± 1% +1.77% (p=0.008 n=5+5)
WordsEncode1e5-8 185MB/s ± 1% 186MB/s ± 0% ~ (p=0.095 n=5+5)
WordsEncode1e6-8 210MB/s ± 2% 211MB/s ± 0% ~ (p=0.222 n=5+5)
RandomEncode-8 13.2GB/s ± 1% 13.1GB/s ± 2% ~ (p=0.286 n=4+5)
_ZFlat0-8 429MB/s ± 0% 433MB/s ± 0% +0.83% (p=0.016 n=4+5)
_ZFlat1-8 274MB/s ± 0% 276MB/s ± 0% +0.91% (p=0.016 n=4+5)
_ZFlat2-8 13.6GB/s ± 2% 13.8GB/s ± 1% ~ (p=0.095 n=5+5)
_ZFlat3-8 169MB/s ± 0% 170MB/s ± 0% +0.38% (p=0.032 n=4+5)
_ZFlat4-8 3.09GB/s ± 0% 3.09GB/s ± 1% ~ (p=0.905 n=4+5)
_ZFlat5-8 419MB/s ± 5% 427MB/s ± 1% +1.83% (p=0.032 n=5+5)
_ZFlat6-8 189MB/s ± 1% 190MB/s ± 0% +0.68% (p=0.016 n=4+5)
_ZFlat7-8 181MB/s ± 1% 182MB/s ± 0% +0.60% (p=0.008 n=5+5)
_ZFlat8-8 199MB/s ± 1% 200MB/s ± 0% +0.68% (p=0.008 n=5+5)
_ZFlat9-8 175MB/s ± 0% 175MB/s ± 0% ~ (p=0.095 n=5+5)
_ZFlat10-8 507MB/s ± 0% 507MB/s ± 0% ~ (p=0.222 n=5+5)
_ZFlat11-8 277MB/s ± 0% 278MB/s ± 0% +0.38% (p=0.008 n=5+5)
2016-04-23 10:40:18 +10:00
Nigel Tao
774a97396f
Remove the no-longer-used maxOffset constant.
...
It was no longer used as of commit 8939696c
"Use the same encoding
algorithm as C++ snappy".
2016-04-16 15:15:34 +10:00
Nigel Tao
fa0b0e6289
Eliminate some bounds checks.
...
It seems like a small win:
benchmark old MB/s new MB/s speedup
BenchmarkWordsEncode1e1-4 465.69 471.77 1.01x
BenchmarkWordsEncode1e2-4 60.18 60.27 1.00x
BenchmarkWordsEncode1e3-4 174.42 176.26 1.01x
BenchmarkWordsEncode1e4-4 172.40 175.95 1.02x
BenchmarkWordsEncode1e5-4 134.42 134.86 1.00x
BenchmarkWordsEncode1e6-4 153.09 154.03 1.01x
BenchmarkRandomEncode-4 6504.88 6553.55 1.01x
Benchmark_ZFlat0-4 310.55 313.22 1.01x
Benchmark_ZFlat1-4 198.43 199.73 1.01x
Benchmark_ZFlat2-4 7915.02 8052.65 1.02x
Benchmark_ZFlat3-4 123.07 123.53 1.00x
Benchmark_ZFlat4-4 2220.35 2230.80 1.00x
Benchmark_ZFlat5-4 307.05 309.51 1.01x
Benchmark_ZFlat6-4 136.35 137.19 1.01x
Benchmark_ZFlat7-4 130.67 131.33 1.01x
Benchmark_ZFlat8-4 143.17 144.47 1.01x
Benchmark_ZFlat9-4 125.40 125.85 1.00x
Benchmark_ZFlat10-4 364.30 370.35 1.02x
Benchmark_ZFlat11-4 200.04 199.80 1.00x
2016-04-12 10:37:28 +10:00
Nigel Tao
ef80b33e87
Change the encoder's hash table values from int32 to uint16.
...
Doing s/int32/uint16/ in "var table [maxTableSize]int32" saves 32 KiB of stack
space that needed zero'ing. maxTableSize is 1<<14, or 16384.
We couldn't do this before, in commit cc71ae7c
, since we didn't have
maxBlockSize = 65536 at the time.
benchmark old MB/s new MB/s speedup
BenchmarkWordsEncode1e1-4 468.81 465.82 0.99x
BenchmarkWordsEncode1e2-4 32.62 60.23 1.85x
BenchmarkWordsEncode1e3-4 138.36 174.61 1.26x
BenchmarkWordsEncode1e4-4 170.79 172.43 1.01x
BenchmarkWordsEncode1e5-4 131.02 134.25 1.02x
BenchmarkWordsEncode1e6-4 149.33 153.05 1.02x
BenchmarkRandomEncode-4 5933.57 6846.03 1.15x
Benchmark_ZFlat0-4 306.98 310.74 1.01x
Benchmark_ZFlat1-4 194.65 198.74 1.02x
Benchmark_ZFlat2-4 6784.51 8110.98 1.20x
Benchmark_ZFlat3-4 64.06 123.43 1.93x
Benchmark_ZFlat4-4 2102.05 2224.84 1.06x
Benchmark_ZFlat5-4 303.89 307.19 1.01x
Benchmark_ZFlat6-4 132.74 136.37 1.03x
Benchmark_ZFlat7-4 126.50 130.72 1.03x
Benchmark_ZFlat8-4 140.35 143.61 1.02x
Benchmark_ZFlat9-4 121.73 125.62 1.03x
Benchmark_ZFlat10-4 360.89 365.62 1.01x
Benchmark_ZFlat11-4 195.92 199.96 1.02x
2016-04-11 12:06:28 +10:00
Nigel Tao
9bc0b5ad10
Make heuristic match skipping more aggressive.
...
This is the Go equivalent of an algorithmic change in the C++ snappy
code:
d53de18799
The discussion is at:
https://groups.google.com/d/topic/snappy-compression/3Qa3fASLkNA/discussion
benchmark old MB/s new MB/s speedup
BenchmarkWordsEncode1e1-8 680.57 679.12 1.00x
BenchmarkWordsEncode1e2-8 49.90 49.65 0.99x
BenchmarkWordsEncode1e3-8 213.28 212.75 1.00x
BenchmarkWordsEncode1e4-8 247.05 245.76 0.99x
BenchmarkWordsEncode1e5-8 180.68 179.95 1.00x
BenchmarkWordsEncode1e6-8 205.65 204.83 1.00x
BenchmarkRandomEncode-8 5678.83 11217.33 1.98x
Benchmark_ZFlat0-8 422.83 423.18 1.00x
Benchmark_ZFlat1-8 269.60 271.01 1.01x
Benchmark_ZFlat2-8 5517.16 11517.40 2.09x
Benchmark_ZFlat3-8 92.47 92.39 1.00x
Benchmark_ZFlat4-8 954.63 2947.73 3.09x
Benchmark_ZFlat5-8 419.71 419.87 1.00x
Benchmark_ZFlat6-8 184.13 183.45 1.00x
Benchmark_ZFlat7-8 175.83 175.89 1.00x
Benchmark_ZFlat8-8 193.49 193.84 1.00x
Benchmark_ZFlat9-8 169.02 168.59 1.00x
Benchmark_ZFlat10-8 500.19 499.85 1.00x
Benchmark_ZFlat11-8 271.20 270.60 1.00x
2016-04-10 16:02:28 +10:00
Nigel Tao
cef980a12b
Add more commentary to minNonLiteralBlockSize.
2016-04-07 15:15:05 +10:00
Nigel Tao
6218a584d0
Clarify the semantics of minNonLiteralBlockSize.
2016-04-07 15:09:02 +10:00
Nigel Tao
8939696c22
Use the same encoding algorithm as C++ snappy.
...
When encoding the benchmark files, the output size is smaller:
len(in) old_len(out) new_len(out) new/old_ratio description
102400 23488 22842 0.97 html
702087 346345 335387 0.97 urls
123093 123034 123034 1.00 jpg
200 144 146 1.01 jpg_200
102400 83786 83817 1.00 pdf
409600 95095 92221 0.97 html4
152089 91386 88017 0.96 txt1
125179 80526 77525 0.96 txt2
426754 244658 234392 0.96 txt3
481861 331356 319097 0.96 txt4
118588 24789 23295 0.94 pb
184320 74129 69526 0.94 gaviota
On GOARCH=amd64, the throughput numbers are also faster:
benchmark old MB/s new MB/s speedup
BenchmarkWordsEncode1e1-8 674.93 681.22 1.01x
BenchmarkWordsEncode1e2-8 47.92 49.91 1.04x
BenchmarkWordsEncode1e3-8 189.48 213.64 1.13x
BenchmarkWordsEncode1e4-8 193.17 245.31 1.27x
BenchmarkWordsEncode1e5-8 151.44 178.84 1.18x
BenchmarkWordsEncode1e6-8 180.63 203.74 1.13x
BenchmarkRandomEncode-8 4700.25 5711.91 1.22x
Benchmark_ZFlat0-8 372.12 422.42 1.14x
Benchmark_ZFlat1-8 187.62 270.16 1.44x
Benchmark_ZFlat2-8 4891.26 5542.08 1.13x
Benchmark_ZFlat3-8 86.16 92.53 1.07x
Benchmark_ZFlat4-8 570.31 963.51 1.69x
Benchmark_ZFlat5-8 366.84 418.91 1.14x
Benchmark_ZFlat6-8 164.18 182.67 1.11x
Benchmark_ZFlat7-8 155.23 175.64 1.13x
Benchmark_ZFlat8-8 169.62 193.08 1.14x
Benchmark_ZFlat9-8 149.43 168.62 1.13x
Benchmark_ZFlat10-8 412.63 497.87 1.21x
Benchmark_ZFlat11-8 247.98 269.43 1.09x
2016-04-03 11:18:01 +10:00
Nigel Tao
ebebc71721
Raise the "always encode as literal" size threshold from 4 to 14.
...
This isn't an optimization per se, although it does trade off the
"encode 10 bytes" benchmark to favor speed over output size. The point
of this commit is to move closer to what the C++ snappy code does.
benchmark old MB/s new MB/s speedup
BenchmarkWordsEncode1e1-8 5.77 674.93 116.97x
BenchmarkWordsEncode1e2-8 47.96 47.92 1.00x
BenchmarkWordsEncode1e3-8 190.33 189.48 1.00x
BenchmarkWordsEncode1e4-8 190.25 193.17 1.02x
BenchmarkWordsEncode1e5-8 150.65 151.44 1.01x
BenchmarkWordsEncode1e6-8 180.11 180.63 1.00x
BenchmarkRandomEncode-8 4782.70 4700.25 0.98x
Benchmark_ZFlat0-8 372.49 372.12 1.00x
Benchmark_ZFlat1-8 186.49 187.62 1.01x
Benchmark_ZFlat2-8 4979.47 4891.26 0.98x
Benchmark_ZFlat3-8 85.76 86.16 1.00x
Benchmark_ZFlat4-8 566.31 570.31 1.01x
Benchmark_ZFlat5-8 366.01 366.84 1.00x
Benchmark_ZFlat6-8 162.13 164.18 1.01x
Benchmark_ZFlat7-8 153.69 155.23 1.01x
Benchmark_ZFlat8-8 167.91 169.62 1.01x
Benchmark_ZFlat9-8 147.71 149.43 1.01x
Benchmark_ZFlat10-8 414.06 412.63 1.00x
Benchmark_ZFlat11-8 248.87 247.98 1.00x
2016-04-03 09:25:18 +10:00
Nigel Tao
7ede8d1b13
Eliminate some bounds checks in the encoder.
...
As per
https://groups.google.com/d/msg/golang-dev/jVP6h21OyL8/Syhfot9XBQAJ ,
recent versions of the gc compiler can optimize:
func load32(b []byte, i int32) uint32 {
b = b[i : i+4 : len(b)]
return uint32(b[0]) | etc | uint32(b[3])<<24
}
benchmark old MB/s new MB/s speedup
BenchmarkWordsEncode1e1-8 5.78 5.77 1.00x
BenchmarkWordsEncode1e2-8 47.22 47.96 1.02x
BenchmarkWordsEncode1e3-8 183.53 190.33 1.04x
BenchmarkWordsEncode1e4-8 198.95 190.25 0.96x
BenchmarkWordsEncode1e5-8 144.60 150.65 1.04x
BenchmarkWordsEncode1e6-8 172.11 180.11 1.05x
BenchmarkRandomEncode-8 4547.98 4782.70 1.05x
Benchmark_ZFlat0-8 359.18 372.49 1.04x
Benchmark_ZFlat1-8 181.57 186.49 1.03x
Benchmark_ZFlat2-8 4566.75 4979.47 1.09x
Benchmark_ZFlat3-8 86.00 85.76 1.00x
Benchmark_ZFlat4-8 558.08 566.31 1.01x
Benchmark_ZFlat5-8 354.18 366.01 1.03x
Benchmark_ZFlat6-8 156.20 162.13 1.04x
Benchmark_ZFlat7-8 147.76 153.69 1.04x
Benchmark_ZFlat8-8 162.49 167.91 1.03x
Benchmark_ZFlat9-8 142.33 147.71 1.04x
Benchmark_ZFlat10-8 401.93 414.06 1.03x
Benchmark_ZFlat11-8 235.94 248.87 1.05x
2016-04-03 08:30:29 +10:00
Nigel Tao
d1f56d2222
Encode copies of length 65, 66 or 67 as 5 bytes, not 6.
...
The benchmarks don't show a big change either way, but the output is
shorter, and it matches what the C++ snappy code does.
benchmark old MB/s new MB/s speedup
BenchmarkWordsEncode1e1-8 5.77 5.77 1.00x
BenchmarkWordsEncode1e2-8 47.15 47.26 1.00x
BenchmarkWordsEncode1e3-8 180.77 183.25 1.01x
BenchmarkWordsEncode1e4-8 202.01 198.96 0.98x
BenchmarkWordsEncode1e5-8 145.66 144.68 0.99x
BenchmarkWordsEncode1e6-8 174.12 172.31 0.99x
BenchmarkRandomEncode-8 4522.91 4495.78 0.99x
Benchmark_ZFlat0-8 359.70 359.79 1.00x
Benchmark_ZFlat1-8 181.18 181.82 1.00x
Benchmark_ZFlat2-8 4612.52 4557.46 0.99x
Benchmark_ZFlat3-8 85.65 84.82 0.99x
Benchmark_ZFlat4-8 559.51 558.52 1.00x
Benchmark_ZFlat5-8 354.88 352.91 0.99x
Benchmark_ZFlat6-8 156.14 156.26 1.00x
Benchmark_ZFlat7-8 148.18 148.12 1.00x
Benchmark_ZFlat8-8 162.68 162.21 1.00x
Benchmark_ZFlat9-8 141.81 142.32 1.00x
Benchmark_ZFlat10-8 399.79 401.94 1.01x
Benchmark_ZFlat11-8 237.43 235.91 0.99x
2016-04-02 16:22:40 +11:00
Nigel Tao
624b11c0e0
Fix some comment styles.
2016-02-25 15:30:12 +11:00
Nigel Tao
bf2ded9d81
Use 64K blocks when encoding long inputs.
...
This enables future optimizations, such as an encoder's hash table entry being
uint16 instead of int32.
2016-02-22 12:44:36 +11:00
Nigel Tao
d1d908a252
Fix heuristic match skipping.
...
The heuristic was introduced in 4e2aa98e
, based on the C++ Snappy
implementation, but the Go code contained a flawed optimization. The C++ code
used an explicit skip variable:
uint32 bytes_between_hash_lookups = skip++ >> 5;
next_ip = ip + bytes_between_hash_lookups;
whereas the Go code optimized this to be an implicit skip:
s += 1 + (s-lit)>>5
This is equivalent for small s values (relative to lit, the last hash table
hit), but diverges for large ones. This Go program demonstrates the difference:
// main prints the encoder skipping behavior when seeing no hash table hits.
func main() {
s0, s1 := 0, 0
skip := 32
for i := 0; i < 300; i++ {
// This is the C++ Snappy algorithm.
bytes_between_hash_lookups := skip >> 5
skip++
s0 += bytes_between_hash_lookups
// This is the Go Snappy algorithm.
s1 += 1 + s1>>5
// The intention was that the Go algorithm behaves the same as the C++
// one, but it doesn't.
if i%10 == 0 {
fmt.Printf("%d\t%d\t%d\n", i, s0, s1)
}
}
}
It prints:
0 1 1
10 11 11
20 21 21
30 31 31
40 50 50
50 70 73
60 90 105
70 117 149
80 147 208
90 177 288
100 212 398
110 252 548
120 292 752
130 335 1030
140 385 1408
150 435 1922
160 486 2619
170 546 3568
180 606 4861
190 666 6617
200 735 9005
210 805 12257
220 875 16681
230 952 22697
240 1032 30881
250 1112 42015
260 1197 57161
270 1287 77764
280 1377 105791
290 1470 143914
The C++ algorithm is quadratic. The Go algorithm is exponential.
This commit re-introduces the explicit skip variable, so that the Go
implementation matches the C++ implementation.
For completeness, benchmark numbers are included below, but the worse numbers
merely reflect that the old Go algorithm was too aggressive on skipping ahead
on incompressible input (RandomEncode, ZFlat2 and ZFlat4), and so after an
initial warm-up period, it was essentially performing not much more than a
memcpy. Memcpy is indeed fast in terms of MB/s, but it doesn't compress at all,
which obviously defeats the whole purpose of a compression format like Snappy.
benchmark old MB/s new MB/s speedup
BenchmarkWordsEncode1e1-4 3.65 3.77 1.03x
BenchmarkWordsEncode1e2-4 29.22 29.35 1.00x
BenchmarkWordsEncode1e3-4 99.46 101.20 1.02x
BenchmarkWordsEncode1e4-4 118.11 121.54 1.03x
BenchmarkWordsEncode1e5-4 90.37 91.72 1.01x
BenchmarkWordsEncode1e6-4 107.49 108.88 1.01x
BenchmarkRandomEncode-4 7679.09 4491.97 0.58x
Benchmark_ZFlat0-4 229.41 233.79 1.02x
Benchmark_ZFlat1-4 115.10 116.83 1.02x
Benchmark_ZFlat2-4 7256.88 3003.79 0.41x
Benchmark_ZFlat3-4 53.39 54.02 1.01x
Benchmark_ZFlat4-4 1873.63 289.28 0.15x
Benchmark_ZFlat5-4 233.29 234.95 1.01x
Benchmark_ZFlat6-4 101.33 102.79 1.01x
Benchmark_ZFlat7-4 95.26 96.63 1.01x
Benchmark_ZFlat8-4 105.66 106.89 1.01x
Benchmark_ZFlat9-4 92.04 93.11 1.01x
Benchmark_ZFlat10-4 265.68 265.93 1.00x
Benchmark_ZFlat11-4 149.72 151.32 1.01x
These numbers were generated on an amd64 machine, but on a different machine
than the one used for other recent commits. The raw MB/s numbers are therefore
not directly comparable, although the speedup numbers should be.
2016-02-14 16:54:35 +11:00
Nigel Tao
c2359a1bd0
Catch MaxEncodedLen overflow.
2016-02-13 20:09:53 +11:00
Nigel Tao
cc71ae7cc5
Change the encoder's hash table values from int to int32.
...
Doing s/int/int32/ in "var table [maxTableSize]int" saves 64 KiB of
stack space that needed zero'ing. maxTableSize is 1<<14, or 16384.
The benchmarks show the biggest effect for small src lengths, or for
mostly uncompressible data such as the JPEG file (possibly because the
multiple-byte skipping means that the src is effectively short).
On amd64:
benchmark old MB/s new MB/s speedup
BenchmarkWordsEncode1e1-8 3.05 5.71 1.87x
BenchmarkWordsEncode1e2-8 26.98 44.87 1.66x
BenchmarkWordsEncode1e3-8 130.87 156.72 1.20x
BenchmarkWordsEncode1e4-8 162.48 180.89 1.11x
BenchmarkWordsEncode1e5-8 132.35 131.27 0.99x
BenchmarkWordsEncode1e6-8 159.97 158.49 0.99x
BenchmarkRandomEncode-8 12340.86 13485.69 1.09x
Benchmark_ZFlat0-8 329.92 329.17 1.00x
Benchmark_ZFlat1-8 165.06 164.46 1.00x
Benchmark_ZFlat2-8 8955.25 10530.49 1.18x
Benchmark_ZFlat3-8 47.79 80.06 1.68x
Benchmark_ZFlat4-8 2650.55 2732.00 1.03x
Benchmark_ZFlat5-8 336.52 334.94 1.00x
Benchmark_ZFlat6-8 147.99 145.85 0.99x
Benchmark_ZFlat7-8 136.32 137.20 1.01x
Benchmark_ZFlat8-8 153.03 152.15 0.99x
Benchmark_ZFlat9-8 133.18 131.74 0.99x
Benchmark_ZFlat10-8 376.02 378.28 1.01x
Benchmark_ZFlat11-8 224.16 216.81 0.97x
Thanks to Klaus Post for the original suggestion on
https://github.com/golang/snappy/pull/23 but I hesitate to accept that
pull request in its entirety as it makes many changes, some more
complicated than this separable, self-contained s/int/int32/ change.
2016-02-13 14:11:38 +11:00
Nigel Tao
07070fd417
Catch overflow when incrementing src pointers.
2016-02-12 16:49:07 +11:00
Nigel Tao
799c780093
Reduce the number of Write calls to the underlying io.Writer.
2016-02-11 17:10:40 +11:00
Nigel Tao
0fd139378b
Add NewBufferedWriter, and Flush and Close methods.
...
Deprecate NewWriter.
See the discussion on
https://groups.google.com/d/topic/golang-dev/nXp12KmMSvM/discussion
2016-02-11 15:44:51 +11:00
Nigel Tao
4e2aa98ebb
Skip multiple bytes if the last match was >= 32 bytes prior.
...
benchmark old MB/s new MB/s speedup
BenchmarkWordsEncode1e3-8 137.99 132.57 0.96x
BenchmarkWordsEncode1e4-8 173.30 156.26 0.90x
BenchmarkWordsEncode1e5-8 137.16 132.59 0.97x
BenchmarkWordsEncode1e6-8 165.45 164.47 0.99x
BenchmarkRandomEncode-8 140.04 12260.44 87.55x
Benchmark_ZFlat0-8 334.14 335.84 1.01x
Benchmark_ZFlat1-8 168.93 168.19 1.00x
Benchmark_ZFlat2-8 134.42 8763.96 65.20x
Benchmark_ZFlat3-8 48.04 47.36 0.99x
Benchmark_ZFlat4-8 151.86 2578.12 16.98x
Benchmark_ZFlat5-8 344.43 341.94 0.99x
Benchmark_ZFlat6-8 149.21 147.24 0.99x
Benchmark_ZFlat7-8 140.87 138.72 0.98x
Benchmark_ZFlat8-8 155.95 155.89 1.00x
Benchmark_ZFlat9-8 135.05 136.07 1.01x
Benchmark_ZFlat10-8 380.98 379.77 1.00x
Benchmark_ZFlat11-8 227.48 226.59 1.00x
Thanks to Klaus Post for the original suggestion. Unfortunately,
https://github.com/golang/snappy/pull/19 was abandoned.
2016-02-10 14:31:44 +11:00
Damian Gryski
ec7b924342
C++ snappy has moved to github
2015-07-21 09:14:30 +02:00
Nigel Tao
2a6d64140d
Have Encode return []byte instead of ([]byte, error).
...
Encoding can never fail, and returning an error is inconsistent with the
standard library's encoding/{ascii85,hex,pem} packages.
Fixes #8 .
2015-07-17 17:21:07 +10:00
Sebastien Binet
c5eccb269a
all: simpler import path
2015-06-11 10:18:30 +02:00