-
-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Truffle] Enhanced byte array. #4397
Conversation
Prior to this change, przlib used Strings as byte buffers. This owes a lot to the legacy of Ruby Strings being litle more than dumb byte arrays in Ruby < 1.9. However, these usage patterns do not work terribly well with alternative String representations, such as ropes. By switching to a dedicated byte buffer type we can achive much better performance than with rope-backed Strings and use a more suitable API.
I'm curious if it fixes the case mentioned here: #4133 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't been annoyed by zlib
. Have you been annoyed enough to make this worthwhile?
Why can't we use a String
that uses a mutable byte[]
instead of a rope? We already have a mutable rope for that don't we? przlib
won't be the only code that doesn't play well with our current system for ropes.
|
||
module Rubinius | ||
class ByteArray | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we not usually indent twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of the nested Rubinius module code is indented twice. I just followed that style. E.g., see the various Rubinius::FFI
classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OTOH, string_mirror.rb
and process_mirror.rb
indent. I think it's better to indent, as that's the most common style for Ruby code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems good as a temporary solution. How difficult is it to replace whole rope tree with mutable version after it reaches certain depth? Although if merged, we lose a good pathological test-case for ropes.
describe 'with index and length' do | ||
|
||
it 'should return the index corresponding to the first occurrence of the value' do | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Traditionally in specs, there is no extra vertical space before/after it
, describe
or around the example code.
IMHO it hurts readability.
@Specialization | ||
public DynamicObject allocate(DynamicObject rubyClass) { | ||
throw new RaiseException(coreExceptions().typeErrorAllocatorUndefinedFor(rubyClass, this)); | ||
return allocateObjectNode.allocate(rubyClass, ByteList.EMPTY_BYTELIST); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds dangerous, the EMPTY_BYTELIST
might not remain empty very long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's immediately replaced in initialize
. I can make the field @Nullable
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could define ByteArray.new
so allocation and initialization would not need to be split. @Nullable
seems fine too.
Wild idea: could we just use an Array instead of ByteArray here? It would take more space in memory, but require no new code except changes in pr-zlib. |
Hmmm yeah. And add a new |
I'd rather not deal with |
I could look at that again. I started going down that path, but encountered quite a few places things needed to change and we already had a specialized version of array for bytes, so I enhanced that one. I was trying to avoid changing method calls in przlib too much as they expect a string-like API. |
Seems like an OK workaround, but we probably want quickly a native zlib, at least for the most common methods. |
I needed to do this because we can't install gems with zlib as it stands. Maybe increasing the heap will do the trick, but as I mentioned in the PR description it creates very deep ropes and is incredibly slow (assuming you don't run out of memory first). The mutable rope is extremely limited. I started down that path, but it was quite invasive. This is meant to be a patch over until we have a better story for mutable |
Improvements made to ropes have drastically reduced the performance gap that this PR was intended to address. While this branch is still faster (14s vs 17s for zlib specs), it's probably not a strong enough improvement to justify its cost. |
Full disclosure: I don't have confidence I've accounted for every change in przlib here. However, with the change we still pass the zlib specs we passed prior to this change and I've managed to successfully install a gem with this code in place. So, while there may very well be something broken, we are able to run at least one complicated use case without incident.
The basic idea here is przlib really expects String to function well as a byte buffer. E.g., it'll allocate a 64 KB string and then execute
String#setbyte
on each position. Currently, with ropes, this results in the construction of 64KConcatRope
instances and degrades to linked list performance for anyString#getbyte
calls.There are likely improvements we can make to ropes to better support this use case, but we need zlib working now. I anticipate all of this code being replaced with something native, so I took some liberties in modifying files in place. If we need to, we can also revert the commit or merge later on.
I've also added specs for
Rubinius::ByteArray
. They're not comprehensive and nor is the API -- it's for internal use so I don't think we need to support all the various call forms thatString
does. E.g., I rewrote any form ofString#[Range]
toRubinius::ByteArray[index, length]
.