Laura Radville
f3b2e729cb
Merge pull request #17 from github/radville/prepare-frozen-strings
...
Prepare for frozen strings: mark strings as mutable to allow us to set --enable-frozen-string-literal globally
2024-09-05 11:04:50 -04:00
Laura Radville
d941a7db0e
marks strings as mutable
2024-09-05 10:11:32 -04:00
Colin Stolley
aa7e3f5d12
Merge pull request #16 from github/ccs-upgrade-deps
...
Upgrade yajl-ruby and rake.
2021-03-18 12:12:37 -07:00
Colin Stolley
2371d7102a
Upgrade yajl-ruby and rake.
...
Dependabot said these dependencies contain security vulnerabilities,
so let's upgrade them.
2021-03-17 14:46:58 -05:00
Carlos Martín Nieto
2a626e0bf0
Merge pull request #15 from github/cmn/bert-as-module
...
decode: define BERT module instead of trying to look it up
2020-10-30 10:26:06 +01:00
Carlos Martín Nieto
59dd530be7
decode: do not keep a reference to classes across calls
...
We do not want to keep the `Bert::Tuple` and `Mochilo` references and re-use
those values across calls. This has never been something we should do, but with
the advent of the compacting GC this becomes even more important.
Look up these values if we need them to be used only inside a single call into
the C code.
2020-10-30 09:25:10 +01:00
Carlos Martín Nieto
901ef50958
decode: define BERT module instead of trying to look it up
...
We want to make sure we are dealing with a module as we expect. Looking up the
object seems to result in the runtime re-using the returned `VALUE` leading to
us calling `supports?` on arbitrary objects when trying to decode.
The existing pattern for re-opening modules is to use `rb_define_module` so
let's do this.
2020-10-29 17:19:47 +01:00
Carlos Martín Nieto
f6727fc37b
Merge pull request #14 from github/slice-instead-of-dup
...
Use shared strings to decrease allocated bytes.
2018-10-12 16:58:52 +02:00
Aaron Patterson
f6b34837c7
Apply same shared string optimization in Ruby
2018-10-11 08:56:36 -07:00
Aaron Patterson
5a447736d2
Use shared strings to decrease allocated bytes.
...
This commit uses string byteslice rather than rb_str_new so that we can
take advantage of Ruby's shared string optimizations. Since we are
slicing all the way to the end of a buffer, we can use a trick which
lets us share byte buffers between string objects.
Here is the benchmark code:
```ruby
require 'bert'
require 'allocation_tracer'
zeroes = "0" * (1024*1024*50)
BERT::Encode.version = :v4
buf = BERT.encode zeroes
ObjectSpace::AllocationTracer.setup(%i{path line type})
result = ObjectSpace::AllocationTracer.trace do
5.times { BERT.decode(buf) }
end
p [:file, :line, :type, :bytes]
result.each do |(file, line, type), info|
p [file, line, type, info.last]
end
```
Here is the output before the change:
```
[aaron@TC-265 bert (slice-instead-of-dup)]$ be ruby -I lib:~/git/allocation_tracer/lib test.rb
[:file, :line, :type, :bytes]
["/Users/aaron/github/bert/lib/bert/decoder.rb", 8, :T_STRING, 209715374]
["/Users/aaron/github/bert/lib/bert/decoder.rb", 8, :T_IMEMO, 0]
```
Here is the output after this patch:
```
[aaron@TC-265 bert (slice-instead-of-dup)]$ be ruby -I lib:~/git/allocation_tracer/lib test.rb
[:file, :line, :type, :bytes]
["/Users/aaron/github/bert/lib/bert/decoder.rb", 8, :T_STRING, 52428961]
["/Users/aaron/github/bert/lib/bert/decoder.rb", 8, :T_IMEMO, 0]
```
Since we share string buffers, the decode method allocates much less
memory.
2018-10-10 11:50:44 -07:00
Carlos Martín Nieto
ca51092ef6
Merge pull request #12 from spraints/msgpack
...
Add v3 and v4 encodings, using mochilo
2017-06-12 14:25:49 +02:00
Matt Burke
fbae1a156c
brianmario/mochilo#20 is merged
2017-06-06 16:37:02 -04:00
Matt Burke
10bb1e3203
Update dependencies so that a new enough mochilo is used
...
In the gemspec, limit the mochilo version to something no older than
1.3 (brianmario/mochilo#20 ) for the 1.x series, and no older than 2.1
(brianmario/mochilo#19 ) for the 2.x series.
In the main Gemfile and in the v2 CI gemfile, bundle from github.com
because there hasn't been a gem cut yet.
In the v1 CI gemfile, bundle from the PR branch.
2017-06-06 16:08:36 -04:00
Matt Burke
c4a1f2ca03
mochilo v1 == bert v3, mochilo v2 == bert v4
2017-05-30 12:32:32 -04:00
Matt Burke
f36d0c6ad5
Use a varying mochilo version in CI
2017-05-30 09:47:15 -04:00
Matt Burke
7ae632e0d5
Bump mochilo
2017-05-26 16:11:53 -04:00
Matt Burke
71b04468f7
C implementation for v3
...
Even though the C implementation calls back to Ruby when it runs
mochilo, the v3 bert decoder is still faster in C than Ruby.
BERT_TEST_IMPL=C bundle exec ruby bench/bench.rb:
user system total real
BERT v1 tiny 0.000000 0.000000 0.000000 ( 0.009276)
BERT v1 small 0.070000 0.000000 0.070000 ( 0.091471)
BERT v1 large 0.310000 0.030000 0.340000 ( 0.342087)
BERT v1 complex 1.130000 0.010000 1.140000 ( 1.144703)
BERT v1 long array 2.240000 0.010000 2.250000 ( 2.257003)
BERT v2 tiny 0.010000 0.000000 0.010000 ( 0.008312)
BERT v2 small 0.060000 0.000000 0.060000 ( 0.060999)
BERT v2 large 0.320000 0.020000 0.340000 ( 0.346957)
BERT v2 complex 1.140000 0.000000 1.140000 ( 1.135720)
BERT v2 long array 2.020000 0.000000 2.020000 ( 2.037739)
BERT v3 tiny 0.000000 0.000000 0.000000 ( 0.003123)
BERT v3 small 0.000000 0.000000 0.000000 ( 0.003621)
BERT v3 large 0.240000 0.040000 0.280000 ( 0.289285)
BERT v3 complex 0.040000 0.000000 0.040000 ( 0.036944)
BERT v3 long array 0.170000 0.000000 0.170000 ( 0.170969)
BERT_TEST_IMPL=Ruby bundle exec ruby bench/bench.rb:
user system total real
BERT v1 tiny 0.010000 0.000000 0.010000 ( 0.013944)
BERT v1 small 0.130000 0.000000 0.130000 ( 0.124755)
BERT v1 large 0.560000 0.010000 0.570000 ( 0.571472)
BERT v1 complex 2.750000 0.000000 2.750000 ( 2.758103)
BERT v1 long array 5.440000 0.000000 5.440000 ( 5.476571)
BERT v2 tiny 0.020000 0.000000 0.020000 ( 0.017456)
BERT v2 small 0.140000 0.000000 0.140000 ( 0.136073)
BERT v2 large 0.640000 0.000000 0.640000 ( 0.648367)
BERT v2 complex 2.610000 0.000000 2.610000 ( 2.627479)
BERT v2 long array 5.490000 0.010000 5.500000 ( 5.519414)
BERT v3 tiny 0.010000 0.000000 0.010000 ( 0.003783)
BERT v3 small 0.000000 0.000000 0.000000 ( 0.005775)
BERT v3 large 0.240000 0.010000 0.250000 ( 0.251264)
BERT v3 complex 0.040000 0.000000 0.040000 ( 0.037354)
BERT v3 long array 0.180000 0.000000 0.180000 ( 0.183192)
2017-05-26 15:18:23 -04:00
Matt Burke
a32b6cc101
Use brianmario/mochilo#19
...
The long array time was:
BERT v3 long array 0.520000 0.000000 0.520000 ( 0.522735)
and now it is:
BERT v3 long array 0.190000 0.000000 0.190000 ( 0.183106)
The full benchmark output:
user system total real
BERT v1 tiny 0.010000 0.000000 0.010000 ( 0.013550)
BERT v1 small 0.120000 0.020000 0.140000 ( 0.124408)
BERT v1 large 0.520000 0.000000 0.520000 ( 0.534007)
BERT v1 complex 2.730000 0.010000 2.740000 ( 2.758483)
BERT v1 long array 5.150000 0.030000 5.180000 ( 5.202551)
BERT v2 tiny 0.020000 0.000000 0.020000 ( 0.019604)
BERT v2 small 0.120000 0.000000 0.120000 ( 0.121305)
BERT v2 large 0.630000 0.000000 0.630000 ( 0.630657)
BERT v2 complex 2.590000 0.000000 2.590000 ( 2.586704)
BERT v2 long array 5.390000 0.020000 5.410000 ( 5.453956)
BERT v3 tiny 0.000000 0.000000 0.000000 ( 0.005028)
BERT v3 small 0.000000 0.010000 0.010000 ( 0.006714)
BERT v3 large 0.220000 0.040000 0.260000 ( 0.259619)
BERT v3 complex 0.040000 0.000000 0.040000 ( 0.040061)
BERT v3 long array 0.190000 0.000000 0.190000 ( 0.183106)
JSON tiny 0.000000 0.000000 0.000000 ( 0.008714)
JSON small 0.020000 0.000000 0.020000 ( 0.013369)
JSON large 2.280000 0.010000 2.290000 ( 2.332599)
JSON complex 0.150000 0.000000 0.150000 ( 0.145547)
JSON long array 0.530000 0.000000 0.530000 ( 0.526635)
YAJL tiny 0.000000 0.000000 0.000000 ( 0.005164)
YAJL small 0.020000 0.000000 0.020000 ( 0.016937)
YAJL large 0.940000 0.030000 0.970000 ( 0.987986)
YAJL complex 0.230000 0.010000 0.240000 ( 0.233397)
YAJL long array 0.710000 0.000000 0.710000 ( 0.710261)
Ruby tiny 0.000000 0.000000 0.000000 ( 0.003004)
Ruby small 0.000000 0.000000 0.000000 ( 0.005335)
Ruby large 0.020000 0.000000 0.020000 ( 0.010745)
Ruby complex 0.010000 0.000000 0.010000 ( 0.014781)
Ruby long array 0.030000 0.010000 0.040000 ( 0.033941)
2017-05-26 14:49:06 -04:00
Matt Burke
b5e83ef0d8
Serialize symbol, time, regexp in the benchmark, too
...
user system total real
BERT v1 tiny 0.010000 0.000000 0.010000 ( 0.013280)
BERT v1 small 0.110000 0.000000 0.110000 ( 0.116110)
BERT v1 large 0.630000 0.010000 0.640000 ( 0.646742)
BERT v1 complex 2.710000 0.000000 2.710000 ( 2.738346)
BERT v1 long array 5.360000 0.000000 5.360000 ( 5.391206)
BERT v2 tiny 0.020000 0.000000 0.020000 ( 0.018202)
BERT v2 small 0.140000 0.000000 0.140000 ( 0.133662)
BERT v2 large 0.640000 0.010000 0.650000 ( 0.660656)
BERT v2 complex 2.520000 0.010000 2.530000 ( 2.534638)
BERT v2 long array 5.460000 0.010000 5.470000 ( 5.520119)
BERT v3 tiny 0.010000 0.000000 0.010000 ( 0.010693)
BERT v3 small 0.010000 0.000000 0.010000 ( 0.011390)
BERT v3 large 0.220000 0.030000 0.250000 ( 0.246068)
BERT v3 complex 0.060000 0.000000 0.060000 ( 0.055431)
BERT v3 long array 0.520000 0.000000 0.520000 ( 0.522735)
JSON tiny 0.000000 0.000000 0.000000 ( 0.005991)
JSON small 0.020000 0.000000 0.020000 ( 0.010092)
JSON large 2.200000 0.020000 2.220000 ( 2.252311)
JSON complex 0.160000 0.000000 0.160000 ( 0.159742)
JSON long array 0.450000 0.010000 0.460000 ( 0.454021)
YAJL tiny 0.010000 0.000000 0.010000 ( 0.006357)
YAJL small 0.020000 0.000000 0.020000 ( 0.023327)
YAJL large 0.960000 0.020000 0.980000 ( 1.002778)
YAJL complex 0.280000 0.030000 0.310000 ( 0.317845)
YAJL long array 0.750000 0.000000 0.750000 ( 0.747592)
Ruby tiny 0.000000 0.000000 0.000000 ( 0.003547)
Ruby small 0.010000 0.000000 0.010000 ( 0.007403)
Ruby large 0.010000 0.000000 0.010000 ( 0.012739)
Ruby complex 0.020000 0.000000 0.020000 ( 0.015415)
Ruby long array 0.030000 0.000000 0.030000 ( 0.036519)
2017-05-26 14:48:03 -04:00
Matt Burke
5c272f0635
v4 (mochilo) -> v3
2017-04-27 14:30:54 -04:00
Matt Burke
f7852d7712
remove byebug dependency
2017-04-24 17:58:16 -04:00
Matt Burke
40e478c76b
Make v4, using mochilo
...
mochilo is like msgpack, except that it deals with encodings in a
sane(r?) way. mochilo doesn't have a custom type registry, so I've added
one in the "custom-type-registry" branch, and I'm using that here.
This ended up being faster than the msgpack implementation, because
encoding doesn't have to pop up to ruby to deal with strings.
user system total real
BERT v2 tiny 0.010000 0.000000 0.010000 ( 0.015289)
BERT v2 small 0.130000 0.000000 0.130000 ( 0.126682)
BERT v2 large 0.580000 0.000000 0.580000 ( 0.592555)
BERT v2 complex 2.880000 0.010000 2.890000 ( 2.903082)
BERT v2 long array 4.200000 0.000000 4.200000 ( 4.262165)
BERT v3 tiny 0.020000 0.000000 0.020000 ( 0.020252)
BERT v3 small 0.020000 0.000000 0.020000 ( 0.024444)
BERT v3 large 0.700000 0.010000 0.710000 ( 0.703152)
BERT v3 complex 0.090000 0.000000 0.090000 ( 0.100372)
BERT v3 long array 1.810000 0.010000 1.820000 ( 1.827934)
BERT v4 tiny 0.010000 0.000000 0.010000 ( 0.013301)
BERT v4 small 0.010000 0.000000 0.010000 ( 0.012317)
BERT v4 large 0.230000 0.060000 0.290000 ( 0.292852)
BERT v4 complex 0.070000 0.000000 0.070000 ( 0.064578)
BERT v4 long array 0.110000 0.000000 0.110000 ( 0.110842)
Msgpack tiny 0.000000 0.000000 0.000000 ( 0.001644)
Msgpack small 0.010000 0.000000 0.010000 ( 0.002568)
Msgpack large 0.180000 0.000000 0.180000 ( 0.192238)
Msgpack complex 0.040000 0.000000 0.040000 ( 0.041482)
Msgpack long array 0.070000 0.000000 0.070000 ( 0.065986)
2017-04-24 17:49:30 -04:00
Matt Burke
70d85c7dbc
Use a modified msgpack, so that String can be customized
...
This is slower by a lot.
< BERT v3 long array 0.080000 0.000000 0.080000 ( 0.083473)
> BERT v3 long array 1.960000 0.000000 1.960000 ( 1.981249)
2017-04-24 15:45:13 -04:00
Matt Burke
5c88d933e0
benchmark the new BERT encoder
...
It's slightly slower than plain msgpack, but still much better than
older BERT.
user system total real
BERT v1 tiny 0.020000 0.000000 0.020000 ( 0.019957)
BERT v1 small 0.150000 0.000000 0.150000 ( 0.141066)
BERT v1 large 0.710000 0.000000 0.710000 ( 0.723820)
BERT v1 complex 2.950000 0.000000 2.950000 ( 2.972712)
BERT v1 long array 3.870000 0.000000 3.870000 ( 3.898899)
BERT v2 tiny 0.010000 0.000000 0.010000 ( 0.017890)
BERT v2 small 0.160000 0.000000 0.160000 ( 0.159581)
BERT v2 large 0.720000 0.000000 0.720000 ( 0.733793)
BERT v2 complex 2.960000 0.010000 2.970000 ( 2.975499)
BERT v2 long array 4.080000 0.000000 4.080000 ( 4.106795)
BERT v3 tiny 0.020000 0.000000 0.020000 ( 0.024472)
BERT v3 small 0.020000 0.000000 0.020000 ( 0.025297)
BERT v3 large 0.320000 0.020000 0.340000 ( 0.337157)
BERT v3 complex 0.090000 0.000000 0.090000 ( 0.085884)
BERT v3 long array 0.100000 0.000000 0.100000 ( 0.104439)
JSON tiny 0.010000 0.000000 0.010000 ( 0.007604)
JSON small 0.010000 0.000000 0.010000 ( 0.014245)
JSON large 2.450000 0.010000 2.460000 ( 2.473644)
JSON complex 0.170000 0.000000 0.170000 ( 0.174540)
JSON long array 0.350000 0.000000 0.350000 ( 0.357081)
YAJL tiny 0.010000 0.000000 0.010000 ( 0.005566)
YAJL small 0.020000 0.000000 0.020000 ( 0.019177)
YAJL large 1.060000 0.030000 1.090000 ( 1.094342)
YAJL complex 0.300000 0.000000 0.300000 ( 0.307467)
YAJL long array 0.140000 0.000000 0.140000 ( 0.136261)
Ruby tiny 0.000000 0.000000 0.000000 ( 0.002919)
Ruby small 0.010000 0.000000 0.010000 ( 0.008211)
Ruby large 0.000000 0.010000 0.010000 ( 0.009826)
Ruby complex 0.020000 0.000000 0.020000 ( 0.013560)
Ruby long array 0.080000 0.000000 0.080000 ( 0.082170)
Msgpack tiny 0.000000 0.000000 0.000000 ( 0.002561)
Msgpack small 0.000000 0.000000 0.000000 ( 0.003771)
Msgpack large 0.190000 0.000000 0.190000 ( 0.195700)
Msgpack complex 0.040000 0.010000 0.050000 ( 0.044514)
Msgpack long array 0.070000 0.000000 0.070000 ( 0.071905)
2017-04-21 10:36:27 -04:00
Matt Burke
2fb106607f
do less processing on a msgpackable object
...
* skip 'convert', because the decomposition isn't interesting.
* register more types, so that msgpack can deal with all the types that
BERT can
2017-04-21 10:33:45 -04:00
Matt Burke
d733ee9eac
set up serialization of symbols
2017-04-21 10:00:09 -04:00
Matt Burke
c5268506d2
decode msgpack-based bert
2017-04-21 09:45:05 -04:00
Matt Burke
b710534fa3
Make a basic v3(msgpack) encoder
2017-04-21 09:39:57 -04:00
Matt Burke
9ed0dc3515
bert_test: restructure to prep for v3 tests
2017-04-21 09:16:21 -04:00
Matt Burke
4fa5e2e2d3
Benchmark dat msgpack
2017-04-21 09:03:53 -04:00
Charlie Somerville
1405542759
Merge pull request #11 from github/ci
...
Set up Travis CI
2017-01-12 12:17:20 +11:00
Charlie Somerville
ca38663411
only serialise a Time object with microsecond precision
...
Time.now on Linux returns a Time instance with nanosecond precision,
but BERT only serialises times down to microsecond precision.
2017-01-12 12:13:18 +11:00
Charlie Somerville
57c7b1de79
test ruby and c implementations
2017-01-12 12:13:18 +11:00
Charlie Somerville
59692ed4a5
set up .travis.yml with ruby 2.3 and 2.4
2017-01-12 12:13:18 +11:00
Charlie Somerville
38fdbbf98d
fix unused variable warnings when testing with -w
2017-01-12 12:13:18 +11:00
Charlie Somerville
0dcc7eee8f
Merge pull request #9 from github/integer-unification
...
Ruby 2.4 support
2017-01-12 12:12:09 +11:00
Charlie Somerville
83ffa7f301
Merge pull request #10 from arthurschreiber/arthur/fix-encoding-failures
...
Make sure we only collect binary encoded strings in our buffer.
2017-01-12 12:05:00 +11:00
Arthur Schreiber
76fb02aca8
Make sure we only collect binary encoded strings in our buffer.
2017-01-11 10:10:14 +01:00
Charlie Somerville
3c96b8154c
add test-unit as development dependency
2017-01-11 15:05:26 +11:00
Charlie Somerville
0aa53733e4
update rake dev development for ruby 2.4
2017-01-11 15:02:28 +11:00
Charlie Somerville
5ff7e73aa5
update yajl-ruby dev dependency for ruby 2.4
2017-01-11 15:01:37 +11:00
Charlie Somerville
d8de16106a
replace Fixnum and Bignum with Integer
2017-01-11 15:00:27 +11:00
Carlos Martín Nieto
2c645f789f
The gem is actually yajl-ruby
2016-08-12 12:38:22 +02:00
Carlos Martín Nieto
778694ffd6
Add yajl as a deve dependency
...
We use it for the benchmarks, and it won't load if we haven't told
bundler we want it.
2016-08-12 12:24:49 +02:00
Carlos Martín Nieto
3edcf493b0
Merge pull request #8 from arthurschreiber/arthur/reduce-mem-usage
...
Don't use a StringIO when encoding data.
2016-08-12 12:23:38 +02:00
Arthur Schreiber
55cacadf2e
Don't use a StringIO when encoding data.
...
When data is encoded to BERT, each individual, encoded result piece is stored inside an Array based Buffer. At the end, each piece is sequentially written out to a StringIO object and the underlying String is returned. Unfortunately, this sequential writing to StringIO causes a lot of growth of the internal String object. By calling `#join` on the Buffer internal Array, Ruby will allocate a single string that can contain the whole result in a single step.
2016-08-12 11:40:01 +02:00
Brian Lopez
c489ecd510
Merge pull request #6 from github/write-1-speed
...
speed up `write_1` calls
2016-08-11 08:56:45 -07:00
Aaron Patterson
63758fe5c6
speed up `write_1` calls
...
We don't need to use `pack`, just `chr`.
Before:
```
[aaron@TC bert (write-1-speed)]$ ruby -I lib bench/encode_bench.rb
user system total real
BERT tiny 0.020000 0.000000 0.020000 ( 0.014491)
BERT small 0.130000 0.000000 0.130000 ( 0.140019)
BERT large 0.450000 0.170000 0.620000 ( 0.627474)
BERT complex 2.940000 0.020000 2.960000 ( 2.981667)
```
After:
```
[aaron@TC bert (write-1-speed)]$ ruby -I lib bench/encode_bench.rb
user system total real
BERT tiny 0.010000 0.000000 0.010000 ( 0.011318)
BERT small 0.110000 0.000000 0.110000 ( 0.110367)
BERT large 0.380000 0.170000 0.550000 ( 0.565794)
BERT complex 2.210000 0.020000 2.230000 ( 2.243591)
```
2016-08-10 10:55:06 -07:00
Carlos Martín Nieto
d6abc9afc0
Merge pull request #5 from github/cmn/gemspec
...
Use rake-compiler
2016-08-10 19:45:34 +02:00
Carlos Martín Nieto
70220ecfdc
Bring back the smaller 'large' decode payload
...
The actually-large payload is too large to be of particular use.
2016-05-23 15:33:08 +02:00