snappy

Граф коммитов

Автор	SHA1	Сообщение	Дата
Nigel Tao	ff6b7dc882	Add comments re handling block and stream formats	2019-09-04 16:35:34 +10:00
fatedier	0d9c4c05f1	fix typo	2017-01-25 15:07:54 +08:00
Nigel Tao	988ce01844	Add a fast path for short emitLiteral calls. Compared to the previous commit: name old speed new speed delta WordsEncode1e1-8 667MB/s ± 0% 677MB/s ± 1% +1.57% (p=0.008 n=5+5) WordsEncode1e2-8 353MB/s ± 1% 428MB/s ± 0% +21.37% (p=0.008 n=5+5) WordsEncode1e3-8 383MB/s ± 1% 446MB/s ± 1% +16.65% (p=0.008 n=5+5) WordsEncode1e4-8 277MB/s ± 1% 316MB/s ± 0% +13.93% (p=0.008 n=5+5) WordsEncode1e5-8 248MB/s ± 0% 269MB/s ± 0% +8.57% (p=0.008 n=5+5) WordsEncode1e6-8 296MB/s ± 0% 314MB/s ± 1% +6.08% (p=0.008 n=5+5) RandomEncode-8 14.4GB/s ± 2% 14.4GB/s ± 1% ~ (p=1.000 n=5+5) _ZFlat0-8 748MB/s ± 0% 792MB/s ± 0% +5.87% (p=0.008 n=5+5) _ZFlat1-8 406MB/s ± 0% 436MB/s ± 1% +7.42% (p=0.008 n=5+5) _ZFlat2-8 16.1GB/s ± 1% 16.2GB/s ± 1% ~ (p=0.421 n=5+5) _ZFlat3-8 604MB/s ± 0% 632MB/s ± 1% +4.49% (p=0.008 n=5+5) _ZFlat4-8 7.62GB/s ± 1% 8.00GB/s ± 0% +5.03% (p=0.008 n=5+5) _ZFlat5-8 729MB/s ± 0% 768MB/s ± 0% +5.26% (p=0.008 n=5+5) _ZFlat6-8 267MB/s ± 0% 282MB/s ± 1% +5.92% (p=0.008 n=5+5) _ZFlat7-8 248MB/s ± 0% 264MB/s ± 1% +6.48% (p=0.008 n=5+5) _ZFlat8-8 282MB/s ± 0% 298MB/s ± 0% +5.87% (p=0.008 n=5+5) _ZFlat9-8 231MB/s ± 0% 247MB/s ± 0% +6.79% (p=0.008 n=5+5) _ZFlat10-8 972MB/s ± 0% 1027MB/s ± 0% +5.64% (p=0.008 n=5+5) _ZFlat11-8 401MB/s ± 0% 411MB/s ± 0% +2.43% (p=0.008 n=5+5) The net effect of the past three commits, when compared to just before `68801229` "Write the encoder's encodeBlock in asm": name old speed new speed delta WordsEncode1e1-8 665MB/s ± 0% 677MB/s ± 1% +1.80% (p=0.016 n=4+5) WordsEncode1e2-8 85.0MB/s ± 0% 428.3MB/s ± 0% +403.65% (p=0.016 n=4+5) WordsEncode1e3-8 234MB/s ± 2% 446MB/s ± 1% +90.90% (p=0.008 n=5+5) WordsEncode1e4-8 233MB/s ± 0% 316MB/s ± 0% +35.22% (p=0.008 n=5+5) WordsEncode1e5-8 214MB/s ± 1% 269MB/s ± 0% +25.45% (p=0.008 n=5+5) WordsEncode1e6-8 258MB/s ± 0% 314MB/s ± 1% +21.82% (p=0.008 n=5+5) RandomEncode-8 13.1GB/s ± 1% 14.4GB/s ± 1% +10.31% (p=0.008 n=5+5) _ZFlat0-8 630MB/s ± 0% 792MB/s ± 0% +25.71% (p=0.016 n=4+5) _ZFlat1-8 326MB/s ± 0% 436MB/s ± 1% +33.89% (p=0.016 n=4+5) _ZFlat2-8 13.9GB/s ± 1% 16.2GB/s ± 1% +16.27% (p=0.008 n=5+5) _ZFlat3-8 177MB/s ± 1% 632MB/s ± 1% +257.58% (p=0.008 n=5+5) _ZFlat4-8 6.19GB/s ± 1% 8.00GB/s ± 0% +29.32% (p=0.008 n=5+5) _ZFlat5-8 615MB/s ± 0% 768MB/s ± 0% +24.91% (p=0.008 n=5+5) _ZFlat6-8 231MB/s ± 0% 282MB/s ± 1% +21.95% (p=0.008 n=5+5) _ZFlat7-8 215MB/s ± 1% 264MB/s ± 1% +22.83% (p=0.008 n=5+5) _ZFlat8-8 246MB/s ± 0% 298MB/s ± 0% +21.46% (p=0.008 n=5+5) _ZFlat9-8 202MB/s ± 0% 247MB/s ± 0% +22.17% (p=0.008 n=5+5) _ZFlat10-8 803MB/s ± 0% 1027MB/s ± 0% +27.93% (p=0.008 n=5+5) _ZFlat11-8 351MB/s ± 0% 411MB/s ± 0% +16.92% (p=0.008 n=5+5)	2016-04-23 14:49:49 +10:00
Nigel Tao	6880122951	Write the encoder's encodeBlock in asm. name old speed new speed delta WordsEncode1e1-8 665MB/s ± 0% 678MB/s ± 0% +2.00% (p=0.016 n=4+5) WordsEncode1e2-8 85.0MB/s ± 0% 90.1MB/s ± 0% +5.90% (p=0.016 n=4+5) WordsEncode1e3-8 234MB/s ± 2% 295MB/s ± 0% +26.20% (p=0.008 n=5+5) WordsEncode1e4-8 233MB/s ± 0% 276MB/s ± 0% +18.31% (p=0.008 n=5+5) WordsEncode1e5-8 214MB/s ± 1% 248MB/s ± 0% +15.52% (p=0.008 n=5+5) WordsEncode1e6-8 258MB/s ± 0% 295MB/s ± 0% +14.62% (p=0.008 n=5+5) RandomEncode-8 13.1GB/s ± 1% 14.4GB/s ± 1% +10.27% (p=0.008 n=5+5) _ZFlat0-8 630MB/s ± 0% 749MB/s ± 0% +18.96% (p=0.016 n=4+5) _ZFlat1-8 326MB/s ± 0% 405MB/s ± 0% +24.41% (p=0.029 n=4+4) _ZFlat2-8 13.9GB/s ± 1% 16.2GB/s ± 1% +16.04% (p=0.008 n=5+5) _ZFlat3-8 177MB/s ± 1% 202MB/s ± 1% +14.51% (p=0.008 n=5+5) _ZFlat4-8 6.19GB/s ± 1% 7.59GB/s ± 1% +22.64% (p=0.008 n=5+5) _ZFlat5-8 615MB/s ± 0% 728MB/s ± 1% +18.45% (p=0.008 n=5+5) _ZFlat6-8 231MB/s ± 0% 266MB/s ± 1% +15.00% (p=0.008 n=5+5) _ZFlat7-8 215MB/s ± 1% 248MB/s ± 0% +15.30% (p=0.008 n=5+5) _ZFlat8-8 246MB/s ± 0% 282MB/s ± 0% +14.73% (p=0.016 n=5+4) _ZFlat9-8 202MB/s ± 0% 231MB/s ± 0% +14.13% (p=0.008 n=5+5) _ZFlat10-8 803MB/s ± 0% 970MB/s ± 0% +20.90% (p=0.008 n=5+5) _ZFlat11-8 351MB/s ± 0% 402MB/s ± 0% +14.29% (p=0.008 n=5+5)	2016-04-23 14:21:06 +10:00
Nigel Tao	17e435849f	Restrict the scope of the tableSize variable. It's really just a style change, not an optimization, but for the record, the numbers don't show a strong change either way, and could easily just be noise. name old speed new speed delta WordsEncode1e1-8 667MB/s ± 0% 665MB/s ± 0% ~ (p=0.190 n=5+4) WordsEncode1e2-8 85.1MB/s ± 0% 85.0MB/s ± 0% ~ (p=0.556 n=5+4) WordsEncode1e3-8 235MB/s ± 0% 234MB/s ± 2% ~ (p=0.690 n=5+5) WordsEncode1e4-8 234MB/s ± 0% 233MB/s ± 0% ~ (p=0.151 n=5+5) WordsEncode1e5-8 216MB/s ± 0% 214MB/s ± 1% -0.61% (p=0.008 n=5+5) WordsEncode1e6-8 258MB/s ± 0% 258MB/s ± 0% -0.29% (p=0.024 n=5+5) RandomEncode-8 13.2GB/s ± 1% 13.1GB/s ± 1% ~ (p=0.056 n=5+5) _ZFlat0-8 629MB/s ± 0% 630MB/s ± 0% ~ (p=0.111 n=5+4) _ZFlat1-8 325MB/s ± 0% 326MB/s ± 0% +0.27% (p=0.016 n=5+4) _ZFlat2-8 13.7GB/s ± 5% 13.9GB/s ± 1% ~ (p=0.310 n=5+5) _ZFlat3-8 177MB/s ± 0% 177MB/s ± 1% ~ (p=0.690 n=5+5) _ZFlat4-8 6.15GB/s ± 2% 6.19GB/s ± 1% ~ (p=0.222 n=5+5) _ZFlat5-8 614MB/s ± 0% 615MB/s ± 0% ~ (p=0.310 n=5+5) _ZFlat6-8 231MB/s ± 2% 231MB/s ± 0% ~ (p=0.690 n=5+5) _ZFlat7-8 215MB/s ± 2% 215MB/s ± 1% ~ (p=0.222 n=5+5) _ZFlat8-8 246MB/s ± 0% 246MB/s ± 0% ~ (p=0.190 n=4+5) _ZFlat9-8 202MB/s ± 0% 202MB/s ± 0% ~ (p=0.683 n=4+5) _ZFlat10-8 794MB/s ± 2% 803MB/s ± 0% +1.13% (p=0.008 n=5+5) _ZFlat11-8 350MB/s ± 0% 351MB/s ± 0% +0.25% (p=0.032 n=4+5)	2016-04-23 12:28:42 +10:00
Nigel Tao	0c43e98dfe	Add comment that dst and src must not overlap.	2016-04-23 12:17:41 +10:00
Nigel Tao	62bb72da9a	Write the encoder's emitLiteral in asm. name old speed new speed delta WordsEncode1e1-8 665MB/s ± 0% 667MB/s ± 0% +0.35% (p=0.008 n=5+5) WordsEncode1e2-8 83.8MB/s ± 1% 85.1MB/s ± 0% +1.47% (p=0.008 n=5+5) WordsEncode1e3-8 231MB/s ± 1% 235MB/s ± 0% +1.81% (p=0.008 n=5+5) WordsEncode1e4-8 232MB/s ± 1% 234MB/s ± 0% +0.78% (p=0.016 n=5+5) WordsEncode1e5-8 212MB/s ± 1% 216MB/s ± 0% +1.55% (p=0.008 n=5+5) WordsEncode1e6-8 257MB/s ± 0% 258MB/s ± 0% +0.68% (p=0.008 n=5+5) RandomEncode-8 13.2GB/s ± 1% 13.2GB/s ± 1% ~ (p=0.548 n=5+5) _ZFlat0-8 629MB/s ± 0% 629MB/s ± 0% ~ (p=0.690 n=5+5) _ZFlat1-8 324MB/s ± 0% 325MB/s ± 0% ~ (p=0.222 n=5+5) _ZFlat2-8 13.9GB/s ± 1% 13.7GB/s ± 5% ~ (p=0.310 n=5+5) _ZFlat3-8 176MB/s ± 1% 177MB/s ± 0% ~ (p=0.548 n=5+5) _ZFlat4-8 6.12GB/s ± 0% 6.15GB/s ± 2% ~ (p=0.151 n=5+5) _ZFlat5-8 614MB/s ± 0% 614MB/s ± 0% ~ (p=0.548 n=5+5) _ZFlat6-8 230MB/s ± 0% 231MB/s ± 2% ~ (p=0.151 n=5+5) _ZFlat7-8 214MB/s ± 0% 215MB/s ± 2% ~ (p=0.151 n=5+5) _ZFlat8-8 244MB/s ± 0% 246MB/s ± 0% +0.71% (p=0.016 n=5+4) _ZFlat9-8 200MB/s ± 0% 202MB/s ± 0% +0.95% (p=0.016 n=5+4) _ZFlat10-8 797MB/s ± 0% 794MB/s ± 2% ~ (p=1.000 n=5+5) _ZFlat11-8 351MB/s ± 1% 350MB/s ± 0% ~ (p=0.730 n=5+4)	2016-04-23 12:05:56 +10:00
Nigel Tao	d8211ff0ee	Write the encoder's emitCopy in asm. name old speed new speed delta WordsEncode1e1-8 690MB/s ± 0% 665MB/s ± 0% -3.64% (p=0.008 n=5+5) WordsEncode1e2-8 83.7MB/s ± 1% 83.8MB/s ± 1% ~ (p=0.421 n=5+5) WordsEncode1e3-8 230MB/s ± 1% 231MB/s ± 1% ~ (p=0.421 n=5+5) WordsEncode1e4-8 233MB/s ± 1% 232MB/s ± 1% ~ (p=0.151 n=5+5) WordsEncode1e5-8 212MB/s ± 0% 212MB/s ± 1% ~ (p=1.000 n=5+5) WordsEncode1e6-8 255MB/s ± 0% 257MB/s ± 0% +0.57% (p=0.008 n=5+5) RandomEncode-8 13.2GB/s ± 1% 13.2GB/s ± 1% ~ (p=0.151 n=5+5) _ZFlat0-8 623MB/s ± 0% 629MB/s ± 0% +0.93% (p=0.008 n=5+5) _ZFlat1-8 319MB/s ± 1% 324MB/s ± 0% +1.65% (p=0.008 n=5+5) _ZFlat2-8 13.9GB/s ± 1% 13.9GB/s ± 1% ~ (p=0.548 n=5+5) _ZFlat3-8 176MB/s ± 0% 176MB/s ± 1% ~ (p=0.690 n=5+5) _ZFlat4-8 6.05GB/s ± 0% 6.12GB/s ± 0% +1.20% (p=0.008 n=5+5) _ZFlat5-8 603MB/s ± 0% 614MB/s ± 0% +1.71% (p=0.008 n=5+5) _ZFlat6-8 228MB/s ± 0% 230MB/s ± 0% +0.83% (p=0.008 n=5+5) _ZFlat7-8 212MB/s ± 0% 214MB/s ± 0% +0.74% (p=0.008 n=5+5) _ZFlat8-8 242MB/s ± 0% 244MB/s ± 0% +0.99% (p=0.008 n=5+5) _ZFlat9-8 199MB/s ± 1% 200MB/s ± 0% +0.57% (p=0.008 n=5+5) _ZFlat10-8 796MB/s ± 1% 797MB/s ± 0% ~ (p=1.000 n=5+5) _ZFlat11-8 348MB/s ± 0% 351MB/s ± 1% ~ (p=0.056 n=5+5) I'm not overly worried about the WordsEncode1e1-8 change: the time/op is around 15 nanoseconds, which is tiny. In comparison, _ZFlat0-8 takes around 163 microseconds (note µs not ns).	2016-04-23 11:48:21 +10:00
Nigel Tao	4f2f9a13dd	Write the encoder's extendMatch in asm. name old speed new speed delta WordsEncode1e1-8 678MB/s ± 0% 690MB/s ± 0% +1.79% (p=0.008 n=5+5) WordsEncode1e2-8 87.5MB/s ± 0% 83.7MB/s ± 1% -4.26% (p=0.008 n=5+5) WordsEncode1e3-8 257MB/s ± 1% 230MB/s ± 1% -10.41% (p=0.008 n=5+5) WordsEncode1e4-8 247MB/s ± 1% 233MB/s ± 1% -5.56% (p=0.008 n=5+5) WordsEncode1e5-8 186MB/s ± 0% 212MB/s ± 0% +14.36% (p=0.008 n=5+5) WordsEncode1e6-8 211MB/s ± 0% 255MB/s ± 0% +20.82% (p=0.008 n=5+5) RandomEncode-8 13.1GB/s ± 2% 13.2GB/s ± 1% ~ (p=0.222 n=5+5) _ZFlat0-8 433MB/s ± 0% 623MB/s ± 0% +43.92% (p=0.008 n=5+5) _ZFlat1-8 276MB/s ± 0% 319MB/s ± 1% +15.42% (p=0.008 n=5+5) _ZFlat2-8 13.8GB/s ± 1% 13.9GB/s ± 1% ~ (p=0.222 n=5+5) _ZFlat3-8 170MB/s ± 0% 176MB/s ± 0% +3.55% (p=0.008 n=5+5) _ZFlat4-8 3.09GB/s ± 1% 6.05GB/s ± 0% +96.00% (p=0.008 n=5+5) _ZFlat5-8 427MB/s ± 1% 603MB/s ± 0% +41.35% (p=0.008 n=5+5) _ZFlat6-8 190MB/s ± 0% 228MB/s ± 0% +20.24% (p=0.008 n=5+5) _ZFlat7-8 182MB/s ± 0% 212MB/s ± 0% +16.87% (p=0.008 n=5+5) _ZFlat8-8 200MB/s ± 0% 242MB/s ± 0% +20.97% (p=0.008 n=5+5) _ZFlat9-8 175MB/s ± 0% 199MB/s ± 1% +13.74% (p=0.008 n=5+5) _ZFlat10-8 507MB/s ± 0% 796MB/s ± 1% +56.83% (p=0.008 n=5+5) _ZFlat11-8 278MB/s ± 0% 348MB/s ± 0% +25.09% (p=0.008 n=5+5) name old time/op new time/op delta ExtendMatch-8 16.5µs ± 1% 7.8µs ± 1% -52.93% (p=0.008 n=5+5)	2016-04-23 11:26:04 +10:00
Nigel Tao	1f4d362d6d	Clarify the emitLiteral and emitCopy preconditions. This allows deleting some redundant code. name old speed new speed delta WordsEncode1e1-8 679MB/s ± 0% 678MB/s ± 0% ~ (p=0.087 n=5+5) WordsEncode1e2-8 87.5MB/s ± 0% 87.5MB/s ± 0% ~ (p=0.579 n=5+5) WordsEncode1e3-8 258MB/s ± 0% 257MB/s ± 1% ~ (p=1.000 n=5+5) WordsEncode1e4-8 243MB/s ± 0% 247MB/s ± 1% +1.77% (p=0.008 n=5+5) WordsEncode1e5-8 185MB/s ± 1% 186MB/s ± 0% ~ (p=0.095 n=5+5) WordsEncode1e6-8 210MB/s ± 2% 211MB/s ± 0% ~ (p=0.222 n=5+5) RandomEncode-8 13.2GB/s ± 1% 13.1GB/s ± 2% ~ (p=0.286 n=4+5) _ZFlat0-8 429MB/s ± 0% 433MB/s ± 0% +0.83% (p=0.016 n=4+5) _ZFlat1-8 274MB/s ± 0% 276MB/s ± 0% +0.91% (p=0.016 n=4+5) _ZFlat2-8 13.6GB/s ± 2% 13.8GB/s ± 1% ~ (p=0.095 n=5+5) _ZFlat3-8 169MB/s ± 0% 170MB/s ± 0% +0.38% (p=0.032 n=4+5) _ZFlat4-8 3.09GB/s ± 0% 3.09GB/s ± 1% ~ (p=0.905 n=4+5) _ZFlat5-8 419MB/s ± 5% 427MB/s ± 1% +1.83% (p=0.032 n=5+5) _ZFlat6-8 189MB/s ± 1% 190MB/s ± 0% +0.68% (p=0.016 n=4+5) _ZFlat7-8 181MB/s ± 1% 182MB/s ± 0% +0.60% (p=0.008 n=5+5) _ZFlat8-8 199MB/s ± 1% 200MB/s ± 0% +0.68% (p=0.008 n=5+5) _ZFlat9-8 175MB/s ± 0% 175MB/s ± 0% ~ (p=0.095 n=5+5) _ZFlat10-8 507MB/s ± 0% 507MB/s ± 0% ~ (p=0.222 n=5+5) _ZFlat11-8 277MB/s ± 0% 278MB/s ± 0% +0.38% (p=0.008 n=5+5)	2016-04-23 10:40:18 +10:00
Nigel Tao	774a97396f	Remove the no-longer-used maxOffset constant. It was no longer used as of commit `8939696c` "Use the same encoding algorithm as C++ snappy".	2016-04-16 15:15:34 +10:00
Nigel Tao	fa0b0e6289	Eliminate some bounds checks. It seems like a small win: benchmark old MB/s new MB/s speedup BenchmarkWordsEncode1e1-4 465.69 471.77 1.01x BenchmarkWordsEncode1e2-4 60.18 60.27 1.00x BenchmarkWordsEncode1e3-4 174.42 176.26 1.01x BenchmarkWordsEncode1e4-4 172.40 175.95 1.02x BenchmarkWordsEncode1e5-4 134.42 134.86 1.00x BenchmarkWordsEncode1e6-4 153.09 154.03 1.01x BenchmarkRandomEncode-4 6504.88 6553.55 1.01x Benchmark_ZFlat0-4 310.55 313.22 1.01x Benchmark_ZFlat1-4 198.43 199.73 1.01x Benchmark_ZFlat2-4 7915.02 8052.65 1.02x Benchmark_ZFlat3-4 123.07 123.53 1.00x Benchmark_ZFlat4-4 2220.35 2230.80 1.00x Benchmark_ZFlat5-4 307.05 309.51 1.01x Benchmark_ZFlat6-4 136.35 137.19 1.01x Benchmark_ZFlat7-4 130.67 131.33 1.01x Benchmark_ZFlat8-4 143.17 144.47 1.01x Benchmark_ZFlat9-4 125.40 125.85 1.00x Benchmark_ZFlat10-4 364.30 370.35 1.02x Benchmark_ZFlat11-4 200.04 199.80 1.00x	2016-04-12 10:37:28 +10:00
Nigel Tao	ef80b33e87	Change the encoder's hash table values from int32 to uint16. Doing s/int32/uint16/ in "var table [maxTableSize]int32" saves 32 KiB of stack space that needed zero'ing. maxTableSize is 1<<14, or 16384. We couldn't do this before, in commit `cc71ae7c`, since we didn't have maxBlockSize = 65536 at the time. benchmark old MB/s new MB/s speedup BenchmarkWordsEncode1e1-4 468.81 465.82 0.99x BenchmarkWordsEncode1e2-4 32.62 60.23 1.85x BenchmarkWordsEncode1e3-4 138.36 174.61 1.26x BenchmarkWordsEncode1e4-4 170.79 172.43 1.01x BenchmarkWordsEncode1e5-4 131.02 134.25 1.02x BenchmarkWordsEncode1e6-4 149.33 153.05 1.02x BenchmarkRandomEncode-4 5933.57 6846.03 1.15x Benchmark_ZFlat0-4 306.98 310.74 1.01x Benchmark_ZFlat1-4 194.65 198.74 1.02x Benchmark_ZFlat2-4 6784.51 8110.98 1.20x Benchmark_ZFlat3-4 64.06 123.43 1.93x Benchmark_ZFlat4-4 2102.05 2224.84 1.06x Benchmark_ZFlat5-4 303.89 307.19 1.01x Benchmark_ZFlat6-4 132.74 136.37 1.03x Benchmark_ZFlat7-4 126.50 130.72 1.03x Benchmark_ZFlat8-4 140.35 143.61 1.02x Benchmark_ZFlat9-4 121.73 125.62 1.03x Benchmark_ZFlat10-4 360.89 365.62 1.01x Benchmark_ZFlat11-4 195.92 199.96 1.02x	2016-04-11 12:06:28 +10:00
Nigel Tao	9bc0b5ad10	Make heuristic match skipping more aggressive. This is the Go equivalent of an algorithmic change in the C++ snappy code: `d53de18799` The discussion is at: https://groups.google.com/d/topic/snappy-compression/3Qa3fASLkNA/discussion benchmark old MB/s new MB/s speedup BenchmarkWordsEncode1e1-8 680.57 679.12 1.00x BenchmarkWordsEncode1e2-8 49.90 49.65 0.99x BenchmarkWordsEncode1e3-8 213.28 212.75 1.00x BenchmarkWordsEncode1e4-8 247.05 245.76 0.99x BenchmarkWordsEncode1e5-8 180.68 179.95 1.00x BenchmarkWordsEncode1e6-8 205.65 204.83 1.00x BenchmarkRandomEncode-8 5678.83 11217.33 1.98x Benchmark_ZFlat0-8 422.83 423.18 1.00x Benchmark_ZFlat1-8 269.60 271.01 1.01x Benchmark_ZFlat2-8 5517.16 11517.40 2.09x Benchmark_ZFlat3-8 92.47 92.39 1.00x Benchmark_ZFlat4-8 954.63 2947.73 3.09x Benchmark_ZFlat5-8 419.71 419.87 1.00x Benchmark_ZFlat6-8 184.13 183.45 1.00x Benchmark_ZFlat7-8 175.83 175.89 1.00x Benchmark_ZFlat8-8 193.49 193.84 1.00x Benchmark_ZFlat9-8 169.02 168.59 1.00x Benchmark_ZFlat10-8 500.19 499.85 1.00x Benchmark_ZFlat11-8 271.20 270.60 1.00x	2016-04-10 16:02:28 +10:00
Nigel Tao	cef980a12b	Add more commentary to minNonLiteralBlockSize.	2016-04-07 15:15:05 +10:00
Nigel Tao	6218a584d0	Clarify the semantics of minNonLiteralBlockSize.	2016-04-07 15:09:02 +10:00
Nigel Tao	8939696c22	Use the same encoding algorithm as C++ snappy. When encoding the benchmark files, the output size is smaller: len(in) old_len(out) new_len(out) new/old_ratio description 102400 23488 22842 0.97 html 702087 346345 335387 0.97 urls 123093 123034 123034 1.00 jpg 200 144 146 1.01 jpg_200 102400 83786 83817 1.00 pdf 409600 95095 92221 0.97 html4 152089 91386 88017 0.96 txt1 125179 80526 77525 0.96 txt2 426754 244658 234392 0.96 txt3 481861 331356 319097 0.96 txt4 118588 24789 23295 0.94 pb 184320 74129 69526 0.94 gaviota On GOARCH=amd64, the throughput numbers are also faster: benchmark old MB/s new MB/s speedup BenchmarkWordsEncode1e1-8 674.93 681.22 1.01x BenchmarkWordsEncode1e2-8 47.92 49.91 1.04x BenchmarkWordsEncode1e3-8 189.48 213.64 1.13x BenchmarkWordsEncode1e4-8 193.17 245.31 1.27x BenchmarkWordsEncode1e5-8 151.44 178.84 1.18x BenchmarkWordsEncode1e6-8 180.63 203.74 1.13x BenchmarkRandomEncode-8 4700.25 5711.91 1.22x Benchmark_ZFlat0-8 372.12 422.42 1.14x Benchmark_ZFlat1-8 187.62 270.16 1.44x Benchmark_ZFlat2-8 4891.26 5542.08 1.13x Benchmark_ZFlat3-8 86.16 92.53 1.07x Benchmark_ZFlat4-8 570.31 963.51 1.69x Benchmark_ZFlat5-8 366.84 418.91 1.14x Benchmark_ZFlat6-8 164.18 182.67 1.11x Benchmark_ZFlat7-8 155.23 175.64 1.13x Benchmark_ZFlat8-8 169.62 193.08 1.14x Benchmark_ZFlat9-8 149.43 168.62 1.13x Benchmark_ZFlat10-8 412.63 497.87 1.21x Benchmark_ZFlat11-8 247.98 269.43 1.09x	2016-04-03 11:18:01 +10:00
Nigel Tao	ebebc71721	Raise the "always encode as literal" size threshold from 4 to 14. This isn't an optimization per se, although it does trade off the "encode 10 bytes" benchmark to favor speed over output size. The point of this commit is to move closer to what the C++ snappy code does. benchmark old MB/s new MB/s speedup BenchmarkWordsEncode1e1-8 5.77 674.93 116.97x BenchmarkWordsEncode1e2-8 47.96 47.92 1.00x BenchmarkWordsEncode1e3-8 190.33 189.48 1.00x BenchmarkWordsEncode1e4-8 190.25 193.17 1.02x BenchmarkWordsEncode1e5-8 150.65 151.44 1.01x BenchmarkWordsEncode1e6-8 180.11 180.63 1.00x BenchmarkRandomEncode-8 4782.70 4700.25 0.98x Benchmark_ZFlat0-8 372.49 372.12 1.00x Benchmark_ZFlat1-8 186.49 187.62 1.01x Benchmark_ZFlat2-8 4979.47 4891.26 0.98x Benchmark_ZFlat3-8 85.76 86.16 1.00x Benchmark_ZFlat4-8 566.31 570.31 1.01x Benchmark_ZFlat5-8 366.01 366.84 1.00x Benchmark_ZFlat6-8 162.13 164.18 1.01x Benchmark_ZFlat7-8 153.69 155.23 1.01x Benchmark_ZFlat8-8 167.91 169.62 1.01x Benchmark_ZFlat9-8 147.71 149.43 1.01x Benchmark_ZFlat10-8 414.06 412.63 1.00x Benchmark_ZFlat11-8 248.87 247.98 1.00x	2016-04-03 09:25:18 +10:00
Nigel Tao	7ede8d1b13	Eliminate some bounds checks in the encoder. As per https://groups.google.com/d/msg/golang-dev/jVP6h21OyL8/Syhfot9XBQAJ, recent versions of the gc compiler can optimize: func load32(b []byte, i int32) uint32 { b = b[i : i+4 : len(b)] return uint32(b[0]) \| etc \| uint32(b[3])<<24 } benchmark old MB/s new MB/s speedup BenchmarkWordsEncode1e1-8 5.78 5.77 1.00x BenchmarkWordsEncode1e2-8 47.22 47.96 1.02x BenchmarkWordsEncode1e3-8 183.53 190.33 1.04x BenchmarkWordsEncode1e4-8 198.95 190.25 0.96x BenchmarkWordsEncode1e5-8 144.60 150.65 1.04x BenchmarkWordsEncode1e6-8 172.11 180.11 1.05x BenchmarkRandomEncode-8 4547.98 4782.70 1.05x Benchmark_ZFlat0-8 359.18 372.49 1.04x Benchmark_ZFlat1-8 181.57 186.49 1.03x Benchmark_ZFlat2-8 4566.75 4979.47 1.09x Benchmark_ZFlat3-8 86.00 85.76 1.00x Benchmark_ZFlat4-8 558.08 566.31 1.01x Benchmark_ZFlat5-8 354.18 366.01 1.03x Benchmark_ZFlat6-8 156.20 162.13 1.04x Benchmark_ZFlat7-8 147.76 153.69 1.04x Benchmark_ZFlat8-8 162.49 167.91 1.03x Benchmark_ZFlat9-8 142.33 147.71 1.04x Benchmark_ZFlat10-8 401.93 414.06 1.03x Benchmark_ZFlat11-8 235.94 248.87 1.05x	2016-04-03 08:30:29 +10:00
Nigel Tao	d1f56d2222	Encode copies of length 65, 66 or 67 as 5 bytes, not 6. The benchmarks don't show a big change either way, but the output is shorter, and it matches what the C++ snappy code does. benchmark old MB/s new MB/s speedup BenchmarkWordsEncode1e1-8 5.77 5.77 1.00x BenchmarkWordsEncode1e2-8 47.15 47.26 1.00x BenchmarkWordsEncode1e3-8 180.77 183.25 1.01x BenchmarkWordsEncode1e4-8 202.01 198.96 0.98x BenchmarkWordsEncode1e5-8 145.66 144.68 0.99x BenchmarkWordsEncode1e6-8 174.12 172.31 0.99x BenchmarkRandomEncode-8 4522.91 4495.78 0.99x Benchmark_ZFlat0-8 359.70 359.79 1.00x Benchmark_ZFlat1-8 181.18 181.82 1.00x Benchmark_ZFlat2-8 4612.52 4557.46 0.99x Benchmark_ZFlat3-8 85.65 84.82 0.99x Benchmark_ZFlat4-8 559.51 558.52 1.00x Benchmark_ZFlat5-8 354.88 352.91 0.99x Benchmark_ZFlat6-8 156.14 156.26 1.00x Benchmark_ZFlat7-8 148.18 148.12 1.00x Benchmark_ZFlat8-8 162.68 162.21 1.00x Benchmark_ZFlat9-8 141.81 142.32 1.00x Benchmark_ZFlat10-8 399.79 401.94 1.01x Benchmark_ZFlat11-8 237.43 235.91 0.99x	2016-04-02 16:22:40 +11:00
Nigel Tao	624b11c0e0	Fix some comment styles.	2016-02-25 15:30:12 +11:00
Nigel Tao	bf2ded9d81	Use 64K blocks when encoding long inputs. This enables future optimizations, such as an encoder's hash table entry being uint16 instead of int32.	2016-02-22 12:44:36 +11:00
Nigel Tao	d1d908a252	Fix heuristic match skipping. The heuristic was introduced in `4e2aa98e`, based on the C++ Snappy implementation, but the Go code contained a flawed optimization. The C++ code used an explicit skip variable: uint32 bytes_between_hash_lookups = skip++ >> 5; next_ip = ip + bytes_between_hash_lookups; whereas the Go code optimized this to be an implicit skip: s += 1 + (s-lit)>>5 This is equivalent for small s values (relative to lit, the last hash table hit), but diverges for large ones. This Go program demonstrates the difference: // main prints the encoder skipping behavior when seeing no hash table hits. func main() { s0, s1 := 0, 0 skip := 32 for i := 0; i < 300; i++ { // This is the C++ Snappy algorithm. bytes_between_hash_lookups := skip >> 5 skip++ s0 += bytes_between_hash_lookups // This is the Go Snappy algorithm. s1 += 1 + s1>>5 // The intention was that the Go algorithm behaves the same as the C++ // one, but it doesn't. if i%10 == 0 { fmt.Printf("%d\t%d\t%d\n", i, s0, s1) } } } It prints: 0 1 1 10 11 11 20 21 21 30 31 31 40 50 50 50 70 73 60 90 105 70 117 149 80 147 208 90 177 288 100 212 398 110 252 548 120 292 752 130 335 1030 140 385 1408 150 435 1922 160 486 2619 170 546 3568 180 606 4861 190 666 6617 200 735 9005 210 805 12257 220 875 16681 230 952 22697 240 1032 30881 250 1112 42015 260 1197 57161 270 1287 77764 280 1377 105791 290 1470 143914 The C++ algorithm is quadratic. The Go algorithm is exponential. This commit re-introduces the explicit skip variable, so that the Go implementation matches the C++ implementation. For completeness, benchmark numbers are included below, but the worse numbers merely reflect that the old Go algorithm was too aggressive on skipping ahead on incompressible input (RandomEncode, ZFlat2 and ZFlat4), and so after an initial warm-up period, it was essentially performing not much more than a memcpy. Memcpy is indeed fast in terms of MB/s, but it doesn't compress at all, which obviously defeats the whole purpose of a compression format like Snappy. benchmark old MB/s new MB/s speedup BenchmarkWordsEncode1e1-4 3.65 3.77 1.03x BenchmarkWordsEncode1e2-4 29.22 29.35 1.00x BenchmarkWordsEncode1e3-4 99.46 101.20 1.02x BenchmarkWordsEncode1e4-4 118.11 121.54 1.03x BenchmarkWordsEncode1e5-4 90.37 91.72 1.01x BenchmarkWordsEncode1e6-4 107.49 108.88 1.01x BenchmarkRandomEncode-4 7679.09 4491.97 0.58x Benchmark_ZFlat0-4 229.41 233.79 1.02x Benchmark_ZFlat1-4 115.10 116.83 1.02x Benchmark_ZFlat2-4 7256.88 3003.79 0.41x Benchmark_ZFlat3-4 53.39 54.02 1.01x Benchmark_ZFlat4-4 1873.63 289.28 0.15x Benchmark_ZFlat5-4 233.29 234.95 1.01x Benchmark_ZFlat6-4 101.33 102.79 1.01x Benchmark_ZFlat7-4 95.26 96.63 1.01x Benchmark_ZFlat8-4 105.66 106.89 1.01x Benchmark_ZFlat9-4 92.04 93.11 1.01x Benchmark_ZFlat10-4 265.68 265.93 1.00x Benchmark_ZFlat11-4 149.72 151.32 1.01x These numbers were generated on an amd64 machine, but on a different machine than the one used for other recent commits. The raw MB/s numbers are therefore not directly comparable, although the speedup numbers should be.	2016-02-14 16:54:35 +11:00
Nigel Tao	c2359a1bd0	Catch MaxEncodedLen overflow.	2016-02-13 20:09:53 +11:00
Nigel Tao	cc71ae7cc5	Change the encoder's hash table values from int to int32. Doing s/int/int32/ in "var table [maxTableSize]int" saves 64 KiB of stack space that needed zero'ing. maxTableSize is 1<<14, or 16384. The benchmarks show the biggest effect for small src lengths, or for mostly uncompressible data such as the JPEG file (possibly because the multiple-byte skipping means that the src is effectively short). On amd64: benchmark old MB/s new MB/s speedup BenchmarkWordsEncode1e1-8 3.05 5.71 1.87x BenchmarkWordsEncode1e2-8 26.98 44.87 1.66x BenchmarkWordsEncode1e3-8 130.87 156.72 1.20x BenchmarkWordsEncode1e4-8 162.48 180.89 1.11x BenchmarkWordsEncode1e5-8 132.35 131.27 0.99x BenchmarkWordsEncode1e6-8 159.97 158.49 0.99x BenchmarkRandomEncode-8 12340.86 13485.69 1.09x Benchmark_ZFlat0-8 329.92 329.17 1.00x Benchmark_ZFlat1-8 165.06 164.46 1.00x Benchmark_ZFlat2-8 8955.25 10530.49 1.18x Benchmark_ZFlat3-8 47.79 80.06 1.68x Benchmark_ZFlat4-8 2650.55 2732.00 1.03x Benchmark_ZFlat5-8 336.52 334.94 1.00x Benchmark_ZFlat6-8 147.99 145.85 0.99x Benchmark_ZFlat7-8 136.32 137.20 1.01x Benchmark_ZFlat8-8 153.03 152.15 0.99x Benchmark_ZFlat9-8 133.18 131.74 0.99x Benchmark_ZFlat10-8 376.02 378.28 1.01x Benchmark_ZFlat11-8 224.16 216.81 0.97x Thanks to Klaus Post for the original suggestion on https://github.com/golang/snappy/pull/23 but I hesitate to accept that pull request in its entirety as it makes many changes, some more complicated than this separable, self-contained s/int/int32/ change.	2016-02-13 14:11:38 +11:00
Nigel Tao	07070fd417	Catch overflow when incrementing src pointers.	2016-02-12 16:49:07 +11:00
Nigel Tao	799c780093	Reduce the number of Write calls to the underlying io.Writer.	2016-02-11 17:10:40 +11:00
Nigel Tao	0fd139378b	Add NewBufferedWriter, and Flush and Close methods. Deprecate NewWriter. See the discussion on https://groups.google.com/d/topic/golang-dev/nXp12KmMSvM/discussion	2016-02-11 15:44:51 +11:00
Nigel Tao	4e2aa98ebb	Skip multiple bytes if the last match was >= 32 bytes prior. benchmark old MB/s new MB/s speedup BenchmarkWordsEncode1e3-8 137.99 132.57 0.96x BenchmarkWordsEncode1e4-8 173.30 156.26 0.90x BenchmarkWordsEncode1e5-8 137.16 132.59 0.97x BenchmarkWordsEncode1e6-8 165.45 164.47 0.99x BenchmarkRandomEncode-8 140.04 12260.44 87.55x Benchmark_ZFlat0-8 334.14 335.84 1.01x Benchmark_ZFlat1-8 168.93 168.19 1.00x Benchmark_ZFlat2-8 134.42 8763.96 65.20x Benchmark_ZFlat3-8 48.04 47.36 0.99x Benchmark_ZFlat4-8 151.86 2578.12 16.98x Benchmark_ZFlat5-8 344.43 341.94 0.99x Benchmark_ZFlat6-8 149.21 147.24 0.99x Benchmark_ZFlat7-8 140.87 138.72 0.98x Benchmark_ZFlat8-8 155.95 155.89 1.00x Benchmark_ZFlat9-8 135.05 136.07 1.01x Benchmark_ZFlat10-8 380.98 379.77 1.00x Benchmark_ZFlat11-8 227.48 226.59 1.00x Thanks to Klaus Post for the original suggestion. Unfortunately, https://github.com/golang/snappy/pull/19 was abandoned.	2016-02-10 14:31:44 +11:00
Damian Gryski	ec7b924342	C++ snappy has moved to github	2015-07-21 09:14:30 +02:00
Nigel Tao	2a6d64140d	Have Encode return []byte instead of ([]byte, error). Encoding can never fail, and returning an error is inconsistent with the standard library's encoding/{ascii85,hex,pem} packages. Fixes #8.	2015-07-17 17:21:07 +10:00
Sebastien Binet	c5eccb269a	all: simpler import path	2015-06-11 10:18:30 +02:00

32 Коммитов