Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Truffle] Incorrect string size #4396

Closed
bjfish opened this issue Dec 18, 2016 · 2 comments
Closed

[Truffle] Incorrect string size #4396

bjfish opened this issue Dec 18, 2016 · 2 comments
Assignees
Milestone

Comments

@bjfish
Copy link
Contributor

bjfish commented Dec 18, 2016

I've ran into this issue while working on encode specs / scrub stuff

Example

x81 = [0x81].pack('C').force_encoding('utf-8')
puts "abc\u3042#{x81}".size

Expected Behavior

$ ruby ../jruby-patches/str_size.rb 
5

Actual Behavior

$ jt run ../jruby-patches/str_size.rb 
7
@nirvdrum
Copy link
Contributor

I'll investigate more. At first blush, you're probably hitting this TODO:

// TODO (nirvdrum 08-Mar-16): This should recalculate the character length since the new encoding may treat the bytes differently.
return new ValidLeafRope(getRawBytes(), newEncoding, characterLength());

@nirvdrum
Copy link
Contributor

I've fixed this in 3f4af19

The real problem is we just reported the byte length as the string length for broken strings while MRI tries to calculate the character length and stops when it encounters an invalid byte sequence. Neither value is correct -- the string is broken so you can't calculate the correct value. So, I previously opted to use a faster calculation. But, broken strings are likely to be infrequent so we can be slower to match MRI. However, I also wouldn't recommend relying on that value since it seems like something that's not well defined.

@nirvdrum nirvdrum added this to the truffle-dev milestone Dec 19, 2016
@enebo enebo added this to the Non-Release milestone Dec 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants