Граф коммитов

147 Коммитов

Автор SHA1 Сообщение Дата
Laura Radville f3b2e729cb
Merge pull request #17 from github/radville/prepare-frozen-strings
Prepare for frozen strings: mark strings as mutable to allow us to set --enable-frozen-string-literal globally
2024-09-05 11:04:50 -04:00
Laura Radville d941a7db0e marks strings as mutable 2024-09-05 10:11:32 -04:00
Colin Stolley aa7e3f5d12
Merge pull request #16 from github/ccs-upgrade-deps
Upgrade yajl-ruby and rake.
2021-03-18 12:12:37 -07:00
Colin Stolley 2371d7102a Upgrade yajl-ruby and rake.
Dependabot said these dependencies contain security vulnerabilities,
so let's upgrade them.
2021-03-17 14:46:58 -05:00
Carlos Martín Nieto 2a626e0bf0
Merge pull request #15 from github/cmn/bert-as-module
decode: define BERT module instead of trying to look it up
2020-10-30 10:26:06 +01:00
Carlos Martín Nieto 59dd530be7 decode: do not keep a reference to classes across calls
We do not want to keep the `Bert::Tuple` and `Mochilo` references and re-use
those values across calls. This has never been something we should do, but with
the advent of the compacting GC this becomes even more important.

Look up these values if we need them to be used only inside a single call into
the C code.
2020-10-30 09:25:10 +01:00
Carlos Martín Nieto 901ef50958 decode: define BERT module instead of trying to look it up
We want to make sure we are dealing with a module as we expect. Looking up the
object seems to result in the runtime re-using the returned `VALUE` leading to
us calling `supports?` on arbitrary objects when trying to decode.

The existing pattern for re-opening modules is to use `rb_define_module` so
let's do this.
2020-10-29 17:19:47 +01:00
Carlos Martín Nieto f6727fc37b
Merge pull request #14 from github/slice-instead-of-dup
Use shared strings to decrease allocated bytes.
2018-10-12 16:58:52 +02:00
Aaron Patterson f6b34837c7
Apply same shared string optimization in Ruby 2018-10-11 08:56:36 -07:00
Aaron Patterson 5a447736d2
Use shared strings to decrease allocated bytes.
This commit uses string byteslice rather than rb_str_new so that we can
take advantage of Ruby's shared string optimizations.  Since we are
slicing all the way to the end of a buffer, we can use a trick which
lets us share byte buffers between string objects.

Here is the benchmark code:

```ruby
require 'bert'
require 'allocation_tracer'

zeroes = "0" * (1024*1024*50)

BERT::Encode.version = :v4

buf = BERT.encode zeroes

ObjectSpace::AllocationTracer.setup(%i{path line type})

result = ObjectSpace::AllocationTracer.trace do
  5.times { BERT.decode(buf) }
end

p [:file, :line, :type, :bytes]
result.each do |(file, line, type), info|
  p [file, line, type, info.last]
end
```

Here is the output before the change:

```
[aaron@TC-265 bert (slice-instead-of-dup)]$ be ruby -I lib:~/git/allocation_tracer/lib test.rb
[:file, :line, :type, :bytes]
["/Users/aaron/github/bert/lib/bert/decoder.rb", 8, :T_STRING, 209715374]
["/Users/aaron/github/bert/lib/bert/decoder.rb", 8, :T_IMEMO, 0]
```

Here is the output after this patch:

```
[aaron@TC-265 bert (slice-instead-of-dup)]$ be ruby -I lib:~/git/allocation_tracer/lib test.rb
[:file, :line, :type, :bytes]
["/Users/aaron/github/bert/lib/bert/decoder.rb", 8, :T_STRING, 52428961]
["/Users/aaron/github/bert/lib/bert/decoder.rb", 8, :T_IMEMO, 0]
```

Since we share string buffers, the decode method allocates much less
memory.
2018-10-10 11:50:44 -07:00
Carlos Martín Nieto ca51092ef6 Merge pull request #12 from spraints/msgpack
Add v3 and v4 encodings, using mochilo
2017-06-12 14:25:49 +02:00
Matt Burke fbae1a156c brianmario/mochilo#20 is merged 2017-06-06 16:37:02 -04:00
Matt Burke 10bb1e3203 Update dependencies so that a new enough mochilo is used
In the gemspec, limit the mochilo version to something no older than
1.3 (brianmario/mochilo#20) for the 1.x series, and no older than 2.1
(brianmario/mochilo#19) for the 2.x series.

In the main Gemfile and in the v2 CI gemfile, bundle from github.com
because there hasn't been a gem cut yet.

In the v1 CI gemfile, bundle from the PR branch.
2017-06-06 16:08:36 -04:00
Matt Burke c4a1f2ca03 mochilo v1 == bert v3, mochilo v2 == bert v4 2017-05-30 12:32:32 -04:00
Matt Burke f36d0c6ad5 Use a varying mochilo version in CI 2017-05-30 09:47:15 -04:00
Matt Burke 7ae632e0d5 Bump mochilo 2017-05-26 16:11:53 -04:00
Matt Burke 71b04468f7 C implementation for v3
Even though the C implementation calls back to Ruby when it runs
mochilo, the v3 bert decoder is still faster in C than Ruby.

BERT_TEST_IMPL=C bundle exec ruby bench/bench.rb:

                                     user     system      total        real
BERT v1 tiny                     0.000000   0.000000   0.000000 (  0.009276)
BERT v1 small                    0.070000   0.000000   0.070000 (  0.091471)
BERT v1 large                    0.310000   0.030000   0.340000 (  0.342087)
BERT v1 complex                  1.130000   0.010000   1.140000 (  1.144703)
BERT v1 long array               2.240000   0.010000   2.250000 (  2.257003)
BERT v2 tiny                     0.010000   0.000000   0.010000 (  0.008312)
BERT v2 small                    0.060000   0.000000   0.060000 (  0.060999)
BERT v2 large                    0.320000   0.020000   0.340000 (  0.346957)
BERT v2 complex                  1.140000   0.000000   1.140000 (  1.135720)
BERT v2 long array               2.020000   0.000000   2.020000 (  2.037739)
BERT v3 tiny                     0.000000   0.000000   0.000000 (  0.003123)
BERT v3 small                    0.000000   0.000000   0.000000 (  0.003621)
BERT v3 large                    0.240000   0.040000   0.280000 (  0.289285)
BERT v3 complex                  0.040000   0.000000   0.040000 (  0.036944)
BERT v3 long array               0.170000   0.000000   0.170000 (  0.170969)

BERT_TEST_IMPL=Ruby bundle exec ruby bench/bench.rb:

                                     user     system      total        real
BERT v1 tiny                     0.010000   0.000000   0.010000 (  0.013944)
BERT v1 small                    0.130000   0.000000   0.130000 (  0.124755)
BERT v1 large                    0.560000   0.010000   0.570000 (  0.571472)
BERT v1 complex                  2.750000   0.000000   2.750000 (  2.758103)
BERT v1 long array               5.440000   0.000000   5.440000 (  5.476571)
BERT v2 tiny                     0.020000   0.000000   0.020000 (  0.017456)
BERT v2 small                    0.140000   0.000000   0.140000 (  0.136073)
BERT v2 large                    0.640000   0.000000   0.640000 (  0.648367)
BERT v2 complex                  2.610000   0.000000   2.610000 (  2.627479)
BERT v2 long array               5.490000   0.010000   5.500000 (  5.519414)
BERT v3 tiny                     0.010000   0.000000   0.010000 (  0.003783)
BERT v3 small                    0.000000   0.000000   0.000000 (  0.005775)
BERT v3 large                    0.240000   0.010000   0.250000 (  0.251264)
BERT v3 complex                  0.040000   0.000000   0.040000 (  0.037354)
BERT v3 long array               0.180000   0.000000   0.180000 (  0.183192)
2017-05-26 15:18:23 -04:00
Matt Burke a32b6cc101 Use brianmario/mochilo#19
The long array time was:
BERT v3 long array               0.520000   0.000000   0.520000 (  0.522735)

and now it is:
BERT v3 long array               0.190000   0.000000   0.190000 (  0.183106)

The full benchmark output:
                                     user     system      total        real
BERT v1 tiny                     0.010000   0.000000   0.010000 (  0.013550)
BERT v1 small                    0.120000   0.020000   0.140000 (  0.124408)
BERT v1 large                    0.520000   0.000000   0.520000 (  0.534007)
BERT v1 complex                  2.730000   0.010000   2.740000 (  2.758483)
BERT v1 long array               5.150000   0.030000   5.180000 (  5.202551)
BERT v2 tiny                     0.020000   0.000000   0.020000 (  0.019604)
BERT v2 small                    0.120000   0.000000   0.120000 (  0.121305)
BERT v2 large                    0.630000   0.000000   0.630000 (  0.630657)
BERT v2 complex                  2.590000   0.000000   2.590000 (  2.586704)
BERT v2 long array               5.390000   0.020000   5.410000 (  5.453956)
BERT v3 tiny                     0.000000   0.000000   0.000000 (  0.005028)
BERT v3 small                    0.000000   0.010000   0.010000 (  0.006714)
BERT v3 large                    0.220000   0.040000   0.260000 (  0.259619)
BERT v3 complex                  0.040000   0.000000   0.040000 (  0.040061)
BERT v3 long array               0.190000   0.000000   0.190000 (  0.183106)
JSON tiny                        0.000000   0.000000   0.000000 (  0.008714)
JSON small                       0.020000   0.000000   0.020000 (  0.013369)
JSON large                       2.280000   0.010000   2.290000 (  2.332599)
JSON complex                     0.150000   0.000000   0.150000 (  0.145547)
JSON long array                  0.530000   0.000000   0.530000 (  0.526635)
YAJL tiny                        0.000000   0.000000   0.000000 (  0.005164)
YAJL small                       0.020000   0.000000   0.020000 (  0.016937)
YAJL large                       0.940000   0.030000   0.970000 (  0.987986)
YAJL complex                     0.230000   0.010000   0.240000 (  0.233397)
YAJL long array                  0.710000   0.000000   0.710000 (  0.710261)
Ruby tiny                        0.000000   0.000000   0.000000 (  0.003004)
Ruby small                       0.000000   0.000000   0.000000 (  0.005335)
Ruby large                       0.020000   0.000000   0.020000 (  0.010745)
Ruby complex                     0.010000   0.000000   0.010000 (  0.014781)
Ruby long array                  0.030000   0.010000   0.040000 (  0.033941)
2017-05-26 14:49:06 -04:00
Matt Burke b5e83ef0d8 Serialize symbol, time, regexp in the benchmark, too
user     system      total        real
BERT v1 tiny                     0.010000   0.000000   0.010000 (  0.013280)
BERT v1 small                    0.110000   0.000000   0.110000 (  0.116110)
BERT v1 large                    0.630000   0.010000   0.640000 (  0.646742)
BERT v1 complex                  2.710000   0.000000   2.710000 (  2.738346)
BERT v1 long array               5.360000   0.000000   5.360000 (  5.391206)
BERT v2 tiny                     0.020000   0.000000   0.020000 (  0.018202)
BERT v2 small                    0.140000   0.000000   0.140000 (  0.133662)
BERT v2 large                    0.640000   0.010000   0.650000 (  0.660656)
BERT v2 complex                  2.520000   0.010000   2.530000 (  2.534638)
BERT v2 long array               5.460000   0.010000   5.470000 (  5.520119)
BERT v3 tiny                     0.010000   0.000000   0.010000 (  0.010693)
BERT v3 small                    0.010000   0.000000   0.010000 (  0.011390)
BERT v3 large                    0.220000   0.030000   0.250000 (  0.246068)
BERT v3 complex                  0.060000   0.000000   0.060000 (  0.055431)
BERT v3 long array               0.520000   0.000000   0.520000 (  0.522735)
JSON tiny                        0.000000   0.000000   0.000000 (  0.005991)
JSON small                       0.020000   0.000000   0.020000 (  0.010092)
JSON large                       2.200000   0.020000   2.220000 (  2.252311)
JSON complex                     0.160000   0.000000   0.160000 (  0.159742)
JSON long array                  0.450000   0.010000   0.460000 (  0.454021)
YAJL tiny                        0.010000   0.000000   0.010000 (  0.006357)
YAJL small                       0.020000   0.000000   0.020000 (  0.023327)
YAJL large                       0.960000   0.020000   0.980000 (  1.002778)
YAJL complex                     0.280000   0.030000   0.310000 (  0.317845)
YAJL long array                  0.750000   0.000000   0.750000 (  0.747592)
Ruby tiny                        0.000000   0.000000   0.000000 (  0.003547)
Ruby small                       0.010000   0.000000   0.010000 (  0.007403)
Ruby large                       0.010000   0.000000   0.010000 (  0.012739)
Ruby complex                     0.020000   0.000000   0.020000 (  0.015415)
Ruby long array                  0.030000   0.000000   0.030000 (  0.036519)
2017-05-26 14:48:03 -04:00
Matt Burke 5c272f0635 v4 (mochilo) -> v3 2017-04-27 14:30:54 -04:00
Matt Burke f7852d7712 remove byebug dependency 2017-04-24 17:58:16 -04:00
Matt Burke 40e478c76b Make v4, using mochilo
mochilo is like msgpack, except that it deals with encodings in a
sane(r?) way. mochilo doesn't have a custom type registry, so I've added
one in the "custom-type-registry" branch, and I'm using that here.

This ended up being faster than the msgpack implementation, because
encoding doesn't have to pop up to ruby to deal with strings.

                                     user     system      total        real
BERT v2 tiny                     0.010000   0.000000   0.010000 (  0.015289)
BERT v2 small                    0.130000   0.000000   0.130000 (  0.126682)
BERT v2 large                    0.580000   0.000000   0.580000 (  0.592555)
BERT v2 complex                  2.880000   0.010000   2.890000 (  2.903082)
BERT v2 long array               4.200000   0.000000   4.200000 (  4.262165)

BERT v3 tiny                     0.020000   0.000000   0.020000 (  0.020252)
BERT v3 small                    0.020000   0.000000   0.020000 (  0.024444)
BERT v3 large                    0.700000   0.010000   0.710000 (  0.703152)
BERT v3 complex                  0.090000   0.000000   0.090000 (  0.100372)
BERT v3 long array               1.810000   0.010000   1.820000 (  1.827934)

BERT v4 tiny                     0.010000   0.000000   0.010000 (  0.013301)
BERT v4 small                    0.010000   0.000000   0.010000 (  0.012317)
BERT v4 large                    0.230000   0.060000   0.290000 (  0.292852)
BERT v4 complex                  0.070000   0.000000   0.070000 (  0.064578)
BERT v4 long array               0.110000   0.000000   0.110000 (  0.110842)

Msgpack tiny                     0.000000   0.000000   0.000000 (  0.001644)
Msgpack small                    0.010000   0.000000   0.010000 (  0.002568)
Msgpack large                    0.180000   0.000000   0.180000 (  0.192238)
Msgpack complex                  0.040000   0.000000   0.040000 (  0.041482)
Msgpack long array               0.070000   0.000000   0.070000 (  0.065986)
2017-04-24 17:49:30 -04:00
Matt Burke 70d85c7dbc Use a modified msgpack, so that String can be customized
This is slower by a lot.

< BERT v3 long array               0.080000   0.000000   0.080000 (  0.083473)
> BERT v3 long array               1.960000   0.000000   1.960000 (  1.981249)
2017-04-24 15:45:13 -04:00
Matt Burke 5c88d933e0 benchmark the new BERT encoder
It's slightly slower than plain msgpack, but still much better than
older BERT.

                                     user     system      total        real
BERT v1 tiny                     0.020000   0.000000   0.020000 (  0.019957)
BERT v1 small                    0.150000   0.000000   0.150000 (  0.141066)
BERT v1 large                    0.710000   0.000000   0.710000 (  0.723820)
BERT v1 complex                  2.950000   0.000000   2.950000 (  2.972712)
BERT v1 long array               3.870000   0.000000   3.870000 (  3.898899)
BERT v2 tiny                     0.010000   0.000000   0.010000 (  0.017890)
BERT v2 small                    0.160000   0.000000   0.160000 (  0.159581)
BERT v2 large                    0.720000   0.000000   0.720000 (  0.733793)
BERT v2 complex                  2.960000   0.010000   2.970000 (  2.975499)
BERT v2 long array               4.080000   0.000000   4.080000 (  4.106795)
BERT v3 tiny                     0.020000   0.000000   0.020000 (  0.024472)
BERT v3 small                    0.020000   0.000000   0.020000 (  0.025297)
BERT v3 large                    0.320000   0.020000   0.340000 (  0.337157)
BERT v3 complex                  0.090000   0.000000   0.090000 (  0.085884)
BERT v3 long array               0.100000   0.000000   0.100000 (  0.104439)
JSON tiny                        0.010000   0.000000   0.010000 (  0.007604)
JSON small                       0.010000   0.000000   0.010000 (  0.014245)
JSON large                       2.450000   0.010000   2.460000 (  2.473644)
JSON complex                     0.170000   0.000000   0.170000 (  0.174540)
JSON long array                  0.350000   0.000000   0.350000 (  0.357081)
YAJL tiny                        0.010000   0.000000   0.010000 (  0.005566)
YAJL small                       0.020000   0.000000   0.020000 (  0.019177)
YAJL large                       1.060000   0.030000   1.090000 (  1.094342)
YAJL complex                     0.300000   0.000000   0.300000 (  0.307467)
YAJL long array                  0.140000   0.000000   0.140000 (  0.136261)
Ruby tiny                        0.000000   0.000000   0.000000 (  0.002919)
Ruby small                       0.010000   0.000000   0.010000 (  0.008211)
Ruby large                       0.000000   0.010000   0.010000 (  0.009826)
Ruby complex                     0.020000   0.000000   0.020000 (  0.013560)
Ruby long array                  0.080000   0.000000   0.080000 (  0.082170)
Msgpack tiny                     0.000000   0.000000   0.000000 (  0.002561)
Msgpack small                    0.000000   0.000000   0.000000 (  0.003771)
Msgpack large                    0.190000   0.000000   0.190000 (  0.195700)
Msgpack complex                  0.040000   0.010000   0.050000 (  0.044514)
Msgpack long array               0.070000   0.000000   0.070000 (  0.071905)
2017-04-21 10:36:27 -04:00
Matt Burke 2fb106607f do less processing on a msgpackable object
* skip 'convert', because the decomposition isn't interesting.
* register more types, so that msgpack can deal with all the types that
  BERT can
2017-04-21 10:33:45 -04:00
Matt Burke d733ee9eac set up serialization of symbols 2017-04-21 10:00:09 -04:00
Matt Burke c5268506d2 decode msgpack-based bert 2017-04-21 09:45:05 -04:00
Matt Burke b710534fa3 Make a basic v3(msgpack) encoder 2017-04-21 09:39:57 -04:00
Matt Burke 9ed0dc3515 bert_test: restructure to prep for v3 tests 2017-04-21 09:16:21 -04:00
Matt Burke 4fa5e2e2d3 Benchmark dat msgpack 2017-04-21 09:03:53 -04:00
Charlie Somerville 1405542759 Merge pull request #11 from github/ci
Set up Travis CI
2017-01-12 12:17:20 +11:00
Charlie Somerville ca38663411 only serialise a Time object with microsecond precision
Time.now on Linux returns a Time instance with nanosecond precision,
but BERT only serialises times down to microsecond precision.
2017-01-12 12:13:18 +11:00
Charlie Somerville 57c7b1de79 test ruby and c implementations 2017-01-12 12:13:18 +11:00
Charlie Somerville 59692ed4a5 set up .travis.yml with ruby 2.3 and 2.4 2017-01-12 12:13:18 +11:00
Charlie Somerville 38fdbbf98d fix unused variable warnings when testing with -w 2017-01-12 12:13:18 +11:00
Charlie Somerville 0dcc7eee8f Merge pull request #9 from github/integer-unification
Ruby 2.4 support
2017-01-12 12:12:09 +11:00
Charlie Somerville 83ffa7f301 Merge pull request #10 from arthurschreiber/arthur/fix-encoding-failures
Make sure we only collect binary encoded strings in our buffer.
2017-01-12 12:05:00 +11:00
Arthur Schreiber 76fb02aca8 Make sure we only collect binary encoded strings in our buffer. 2017-01-11 10:10:14 +01:00
Charlie Somerville 3c96b8154c add test-unit as development dependency 2017-01-11 15:05:26 +11:00
Charlie Somerville 0aa53733e4 update rake dev development for ruby 2.4 2017-01-11 15:02:28 +11:00
Charlie Somerville 5ff7e73aa5 update yajl-ruby dev dependency for ruby 2.4 2017-01-11 15:01:37 +11:00
Charlie Somerville d8de16106a replace Fixnum and Bignum with Integer 2017-01-11 15:00:27 +11:00
Carlos Martín Nieto 2c645f789f The gem is actually yajl-ruby 2016-08-12 12:38:22 +02:00
Carlos Martín Nieto 778694ffd6 Add yajl as a deve dependency
We use it for the benchmarks, and it won't load if we haven't told
bundler we want it.
2016-08-12 12:24:49 +02:00
Carlos Martín Nieto 3edcf493b0 Merge pull request #8 from arthurschreiber/arthur/reduce-mem-usage
Don't use a StringIO when encoding data.
2016-08-12 12:23:38 +02:00
Arthur Schreiber 55cacadf2e Don't use a StringIO when encoding data.
When data is encoded to BERT, each individual, encoded result piece is stored inside an Array based Buffer. At the end, each piece is sequentially written out to a StringIO object and the underlying String is returned. Unfortunately, this sequential writing to StringIO causes a lot of growth of the internal String object. By calling `#join` on the Buffer internal Array, Ruby will allocate a single string that can contain the whole result in a single step.
2016-08-12 11:40:01 +02:00
Brian Lopez c489ecd510 Merge pull request #6 from github/write-1-speed
speed up `write_1` calls
2016-08-11 08:56:45 -07:00
Aaron Patterson 63758fe5c6
speed up `write_1` calls
We don't need to use `pack`, just `chr`.

Before:

```
[aaron@TC bert (write-1-speed)]$ ruby -I lib bench/encode_bench.rb
                    user     system      total        real
BERT tiny       0.020000   0.000000   0.020000 (  0.014491)
BERT small      0.130000   0.000000   0.130000 (  0.140019)
BERT large      0.450000   0.170000   0.620000 (  0.627474)
BERT complex    2.940000   0.020000   2.960000 (  2.981667)
```

After:

```
[aaron@TC bert (write-1-speed)]$ ruby -I lib bench/encode_bench.rb
                    user     system      total        real
BERT tiny       0.010000   0.000000   0.010000 (  0.011318)
BERT small      0.110000   0.000000   0.110000 (  0.110367)
BERT large      0.380000   0.170000   0.550000 (  0.565794)
BERT complex    2.210000   0.020000   2.230000 (  2.243591)
```
2016-08-10 10:55:06 -07:00
Carlos Martín Nieto d6abc9afc0 Merge pull request #5 from github/cmn/gemspec
Use rake-compiler
2016-08-10 19:45:34 +02:00
Carlos Martín Nieto 70220ecfdc Bring back the smaller 'large' decode payload
The actually-large payload is too large to be of particular use.
2016-05-23 15:33:08 +02:00