[Truffle] Enhanced byte array. #4397

nirvdrum · 2016-12-18T23:31:43Z

Full disclosure: I don't have confidence I've accounted for every change in przlib here. However, with the change we still pass the zlib specs we passed prior to this change and I've managed to successfully install a gem with this code in place. So, while there may very well be something broken, we are able to run at least one complicated use case without incident.

The basic idea here is przlib really expects String to function well as a byte buffer. E.g., it'll allocate a 64 KB string and then execute String#setbyte on each position. Currently, with ropes, this results in the construction of 64K ConcatRope instances and degrades to linked list performance for any String#getbyte calls.

There are likely improvements we can make to ropes to better support this use case, but we need zlib working now. I anticipate all of this code being replaced with something native, so I took some liberties in modifying files in place. If we need to, we can also revert the commit or merge later on.

I've also added specs for Rubinius::ByteArray. They're not comprehensive and nor is the API -- it's for internal use so I don't think we need to support all the various call forms that String does. E.g., I rewrote any form of String#[Range] to Rubinius::ByteArray[index, length].

Prior to this change, przlib used Strings as byte buffers. This owes a lot to the legacy of Ruby Strings being litle more than dumb byte arrays in Ruby < 1.9. However, these usage patterns do not work terribly well with alternative String representations, such as ropes. By switching to a dedicated byte buffer type we can achive much better performance than with rope-backed Strings and use a more suitable API.

bjfish · 2016-12-18T23:49:21Z

I'm curious if it fixes the case mentioned here: #4133

chrisseaton

I haven't been annoyed by zlib. Have you been annoyed enough to make this worthwhile?

Why can't we use a String that uses a mutable byte[] instead of a rope? We already have a mutable rope for that don't we? przlib won't be the only code that doesn't play well with our current system for ropes.

chrisseaton · 2016-12-19T07:09:54Z

truffle/src/main/ruby/core/byte_array.rb

+
+module Rubinius
+class ByteArray
+


Do we not usually indent twice?

None of the nested Rubinius module code is indented twice. I just followed that style. E.g., see the various Rubinius::FFI classes.

OTOH, string_mirror.rb and process_mirror.rb indent. I think it's better to indent, as that's the most common style for Ruby code.

pitr-ch

Seems good as a temporary solution. How difficult is it to replace whole rope tree with mutable version after it reaches certain depth? Although if merged, we lose a good pathological test-case for ropes.

eregon · 2016-12-19T11:17:13Z

spec/truffle/specs/truffle/byte_array/index_spec.rb

+  describe 'with index and length' do
+
+    it 'should return the index corresponding to the first occurrence of the value' do
+


Traditionally in specs, there is no extra vertical space before/after it, describe or around the example code.
IMHO it hurts readability.

eregon · 2016-12-19T11:19:49Z

truffle/src/main/java/org/jruby/truffle/core/rubinius/ByteArrayNodes.java

        @Specialization
        public DynamicObject allocate(DynamicObject rubyClass) {
-            throw new RaiseException(coreExceptions().typeErrorAllocatorUndefinedFor(rubyClass, this));
+            return allocateObjectNode.allocate(rubyClass, ByteList.EMPTY_BYTELIST);


This sounds dangerous, the EMPTY_BYTELIST might not remain empty very long.

It's immediately replaced in initialize. I can make the field @Nullable instead.

You could define ByteArray.new so allocation and initialization would not need to be split. @Nullable seems fine too.

eregon · 2016-12-19T12:13:25Z

Wild idea: could we just use an Array instead of ByteArray here? It would take more space in memory, but require no new code except changes in pr-zlib.

chrisseaton · 2016-12-19T12:13:59Z

Hmmm yeah. And add a new byte specialisation to Array?

eregon · 2016-12-19T12:15:27Z

I'd rather not deal with byte in Ruby, so I was thinking int[].

nirvdrum · 2016-12-19T12:15:51Z

I could look at that again. I started going down that path, but encountered quite a few places things needed to change and we already had a specialized version of array for bytes, so I enhanced that one. I was trying to avoid changing method calls in przlib too much as they expect a string-like API.

eregon · 2016-12-19T12:19:08Z

Seems like an OK workaround, but we probably want quickly a native zlib, at least for the most common methods.
Ropes will need to handle some fallback when getting too deep, but I guess that can wait a bit longer.

nirvdrum · 2016-12-19T12:20:40Z

I needed to do this because we can't install gems with zlib as it stands. Maybe increasing the heap will do the trick, but as I mentioned in the PR description it creates very deep ropes and is incredibly slow (assuming you don't run out of memory first).

The mutable rope is extremely limited. I started down that path, but it was quite invasive. This is meant to be a patch over until we have a better story for mutable byte[] backed strings.

nirvdrum · 2016-12-21T12:01:11Z

Improvements made to ropes have drastically reduced the performance gap that this PR was intended to address. While this branch is still faster (14s vs 17s for zlib specs), it's probably not a strong enough improvement to justify its cost.

nirvdrum added 2 commits December 18, 2016 18:21

[Truffle] Enhanced Rubinius::ByteArray to match more of the String API.

ac82a4f

nirvdrum added the truffle label Dec 18, 2016

nirvdrum added this to the truffle-dev milestone Dec 18, 2016

nirvdrum requested review from bjfish, eregon, pitr-ch and chrisseaton December 18, 2016 23:31

chrisseaton reviewed Dec 19, 2016

View reviewed changes

pitr-ch reviewed Dec 19, 2016

View reviewed changes

eregon reviewed Dec 19, 2016

View reviewed changes

nirvdrum closed this Dec 21, 2016

nirvdrum deleted the truffle-enhanced-byte-array branch December 21, 2016 12:05

nirvdrum referenced this pull request in oracle/truffleruby Jan 30, 2017

Implement ByteArray.allocate

74aef54

enebo added this to the Non-Release milestone Dec 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Sponsors

[Truffle] Enhanced byte array. #4397

[Truffle] Enhanced byte array. #4397

nirvdrum commented Dec 18, 2016

bjfish commented Dec 18, 2016

chrisseaton left a comment

chrisseaton Dec 19, 2016

nirvdrum Dec 19, 2016

eregon Dec 19, 2016

pitr-ch left a comment

eregon Dec 19, 2016

eregon Dec 19, 2016

nirvdrum Dec 19, 2016

eregon Dec 19, 2016

eregon commented Dec 19, 2016

chrisseaton commented Dec 19, 2016

eregon commented Dec 19, 2016

nirvdrum commented Dec 19, 2016

eregon commented Dec 19, 2016

nirvdrum commented Dec 19, 2016

nirvdrum commented Dec 21, 2016

		describe 'with index and length' do

		it 'should return the index corresponding to the first occurrence of the value' do

[Truffle] Enhanced byte array. #4397

[Truffle] Enhanced byte array. #4397

Conversation

nirvdrum commented Dec 18, 2016

bjfish commented Dec 18, 2016

chrisseaton left a comment

Choose a reason for hiding this comment

chrisseaton Dec 19, 2016

Choose a reason for hiding this comment

nirvdrum Dec 19, 2016

Choose a reason for hiding this comment

eregon Dec 19, 2016

Choose a reason for hiding this comment

pitr-ch left a comment

Choose a reason for hiding this comment

eregon Dec 19, 2016

Choose a reason for hiding this comment

eregon Dec 19, 2016

Choose a reason for hiding this comment

nirvdrum Dec 19, 2016

Choose a reason for hiding this comment

eregon Dec 19, 2016

Choose a reason for hiding this comment

eregon commented Dec 19, 2016

chrisseaton commented Dec 19, 2016

eregon commented Dec 19, 2016

nirvdrum commented Dec 19, 2016

eregon commented Dec 19, 2016

nirvdrum commented Dec 19, 2016

nirvdrum commented Dec 21, 2016