Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Symbol#inspect of UTF_16/UTF_32 #4994

Merged
merged 1 commit into from
Jan 22, 2018

Conversation

yui-knk
Copy link
Contributor

@yui-knk yui-knk commented Jan 21, 2018

Stop to append byte before inspect string.
For example, when an encoding of symbolBytes is UTF_16LE, "a" is
[0x61, 0x00]. If we append ":" (0x3A) to symbolBytes before
inspect it, bytes are [0x3A, 0x61, 0x00] with UTF_16LE encoding.
This is not what we want to get. This commit chnages the order of
inspecting and appending to avoid this.

Ref: https://github.com/ruby/ruby/blob/v2_5_0/string.c#L10402

Stop to append byte before inspect string.
For example, when an encoding of `symbolBytes` is UTF_16LE, "a" is
`[0x61, 0x00]`. If we append `":"` (0x3A) to `symbolBytes` before
inspect it, bytes are `[0x3A, 0x61, 0x00]` with UTF_16LE encoding.
This is not what we want to get. This commit chnages the order of
inspecting and appending to avoid this.

Ref: https://github.com/ruby/ruby/blob/v2_5_0/string.c#L10402
@enebo enebo added this to the JRuby 9.1.16.0 milestone Jan 22, 2018
@enebo enebo merged commit 342268d into jruby:jruby-9.1 Jan 22, 2018
@enebo
Copy link
Member

enebo commented Jan 22, 2018

@yui-knk You seem to even fix a second bug in here where we seem to be adding :" at the front of a symbol but add no closing ". I might change this code now that you have fixed this because we potentially make 3 instances of RubyString depending on the symbol being inspected. I think we can reduce this to just one.

@enebo
Copy link
Member

enebo commented Jan 22, 2018

Actually I will not be planning on changing this. 1) :sym.inspect is exceedingly rare in hot code 2) guts of bytelist vs RubyString and ability to determine CR_7BIT is much simpler if we make a string first. Working around that to defer making the string would involve some new code.

I did glance at MRI and they remove some of this cost by using memcopy/memmove and set the ':' and contents of the string. We could optimize in this way if we wanted but due to 1) above I am not inclined to put in that extra effort :)

@yui-knk
Copy link
Contributor Author

yui-knk commented Jan 23, 2018

ability to determine CR_7BIT is much simpler if we make a string first

I agree :)

@yui-knk yui-knk deleted the fix_test_ascii_incomat_inspect branch January 23, 2018 00:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants