Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Truffle] Enhanced byte array. #4397

Closed
wants to merge 2 commits into from

Conversation

nirvdrum
Copy link
Contributor

Full disclosure: I don't have confidence I've accounted for every change in przlib here. However, with the change we still pass the zlib specs we passed prior to this change and I've managed to successfully install a gem with this code in place. So, while there may very well be something broken, we are able to run at least one complicated use case without incident.

The basic idea here is przlib really expects String to function well as a byte buffer. E.g., it'll allocate a 64 KB string and then execute String#setbyte on each position. Currently, with ropes, this results in the construction of 64K ConcatRope instances and degrades to linked list performance for any String#getbyte calls.

There are likely improvements we can make to ropes to better support this use case, but we need zlib working now. I anticipate all of this code being replaced with something native, so I took some liberties in modifying files in place. If we need to, we can also revert the commit or merge later on.

I've also added specs for Rubinius::ByteArray. They're not comprehensive and nor is the API -- it's for internal use so I don't think we need to support all the various call forms that String does. E.g., I rewrote any form of String#[Range] to Rubinius::ByteArray[index, length].

Prior to this change, przlib used Strings as byte buffers. This owes a lot to the legacy of Ruby Strings being litle more than dumb byte arrays in Ruby < 1.9. However, these usage patterns do not work terribly well with alternative String representations, such as ropes. By switching to a dedicated byte buffer type we can achive much better performance than with rope-backed Strings and use a more suitable API.
@nirvdrum nirvdrum added this to the truffle-dev milestone Dec 18, 2016
@bjfish
Copy link
Contributor

bjfish commented Dec 18, 2016

I'm curious if it fixes the case mentioned here: #4133

Copy link
Contributor

@chrisseaton chrisseaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't been annoyed by zlib. Have you been annoyed enough to make this worthwhile?

Why can't we use a String that uses a mutable byte[] instead of a rope? We already have a mutable rope for that don't we? przlib won't be the only code that doesn't play well with our current system for ropes.


module Rubinius
class ByteArray

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not usually indent twice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the nested Rubinius module code is indented twice. I just followed that style. E.g., see the various Rubinius::FFI classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTOH, string_mirror.rb and process_mirror.rb indent. I think it's better to indent, as that's the most common style for Ruby code.

Copy link
Member

@pitr-ch pitr-ch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good as a temporary solution. How difficult is it to replace whole rope tree with mutable version after it reaches certain depth? Although if merged, we lose a good pathological test-case for ropes.

describe 'with index and length' do

it 'should return the index corresponding to the first occurrence of the value' do

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Traditionally in specs, there is no extra vertical space before/after it, describe or around the example code.
IMHO it hurts readability.

@Specialization
public DynamicObject allocate(DynamicObject rubyClass) {
throw new RaiseException(coreExceptions().typeErrorAllocatorUndefinedFor(rubyClass, this));
return allocateObjectNode.allocate(rubyClass, ByteList.EMPTY_BYTELIST);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds dangerous, the EMPTY_BYTELIST might not remain empty very long.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's immediately replaced in initialize. I can make the field @Nullable instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could define ByteArray.new so allocation and initialization would not need to be split. @Nullable seems fine too.

@eregon
Copy link
Member

eregon commented Dec 19, 2016

Wild idea: could we just use an Array instead of ByteArray here? It would take more space in memory, but require no new code except changes in pr-zlib.

@chrisseaton
Copy link
Contributor

Hmmm yeah. And add a new byte specialisation to Array?

@eregon
Copy link
Member

eregon commented Dec 19, 2016

I'd rather not deal with byte in Ruby, so I was thinking int[].

@nirvdrum
Copy link
Contributor Author

I could look at that again. I started going down that path, but encountered quite a few places things needed to change and we already had a specialized version of array for bytes, so I enhanced that one. I was trying to avoid changing method calls in przlib too much as they expect a string-like API.

@eregon
Copy link
Member

eregon commented Dec 19, 2016

Seems like an OK workaround, but we probably want quickly a native zlib, at least for the most common methods.
Ropes will need to handle some fallback when getting too deep, but I guess that can wait a bit longer.

@nirvdrum
Copy link
Contributor Author

I needed to do this because we can't install gems with zlib as it stands. Maybe increasing the heap will do the trick, but as I mentioned in the PR description it creates very deep ropes and is incredibly slow (assuming you don't run out of memory first).

The mutable rope is extremely limited. I started down that path, but it was quite invasive. This is meant to be a patch over until we have a better story for mutable byte[] backed strings.

@nirvdrum
Copy link
Contributor Author

Improvements made to ropes have drastically reduced the performance gap that this PR was intended to address. While this branch is still faster (14s vs 17s for zlib specs), it's probably not a strong enough improvement to justify its cost.

@nirvdrum nirvdrum closed this Dec 21, 2016
@nirvdrum nirvdrum deleted the truffle-enhanced-byte-array branch December 21, 2016 12:05
nirvdrum referenced this pull request in oracle/truffleruby Jan 30, 2017
@enebo enebo added this to the Non-Release milestone Dec 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants