Don't use a StringIO when encoding data.

When data is encoded to BERT, each individual, encoded result piece is stored inside an Array based Buffer. At the end, each piece is sequentially written out to a StringIO object and the underlying String is returned. Unfortunately, this sequential writing to StringIO causes a lot of growth of the internal String object. By calling `#join` on the Buffer internal Array, Ruby will allocate a single string that can contain the whole result in a single step.
This commit is contained in:
Arthur Schreiber 2016-08-12 11:40:01 +02:00
Родитель d6abc9afc0
Коммит 55cacadf2e
1 изменённых файлов: 5 добавлений и 5 удалений

Просмотреть файл

@ -62,6 +62,10 @@ module BERT
@buf.each { |x| io.write x }
end
def to_s
@buf.join("")
end
def bytesize
@buf.map(&:bytesize).inject :+
end
@ -74,11 +78,7 @@ module BERT
end
def self.encode(data)
buf = encode_to_buffer data
io = StringIO.new
io.set_encoding('binary') if io.respond_to?(:set_encoding)
buf.write_to io
io.string
encode_to_buffer(data).to_s
end
def self.encode_data(data, io)