[Truffle] Incorrect string size #4396

bjfish · 2016-12-18T22:05:09Z

I've ran into this issue while working on encode specs / scrub stuff

Example

x81 = [0x81].pack('C').force_encoding('utf-8')
puts "abc\u3042#{x81}".size

Expected Behavior

$ ruby ../jruby-patches/str_size.rb 
5

Actual Behavior

$ jt run ../jruby-patches/str_size.rb 
7

The text was updated successfully, but these errors were encountered:

nirvdrum · 2016-12-19T03:39:33Z

I'll investigate more. At first blush, you're probably hitting this TODO:

jruby/truffle/src/main/java/org/jruby/truffle/core/rope/ValidLeafRope.java

Lines 27 to 28 in f17102c

    
           // TODO (nirvdrum 08-Mar-16): This should recalculate the character length since the new encoding may treat the bytes differently. 
        
           return new ValidLeafRope(getRawBytes(), newEncoding, characterLength());

nirvdrum · 2016-12-19T14:57:21Z

I've fixed this in 3f4af19

The real problem is we just reported the byte length as the string length for broken strings while MRI tries to calculate the character length and stops when it encounters an invalid byte sequence. Neither value is correct -- the string is broken so you can't calculate the correct value. So, I previously opted to use a faster calculation. But, broken strings are likely to be infrequent so we can be slower to match MRI. However, I also wouldn't recommend relying on that value since it seems like something that's not well defined.

bjfish added the truffle label Dec 18, 2016

bjfish assigned nirvdrum Dec 18, 2016

nirvdrum closed this as completed Dec 19, 2016

nirvdrum added this to the truffle-dev milestone Dec 19, 2016

enebo added this to the Non-Release milestone Dec 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Sponsors

[Truffle] Incorrect string size #4396

[Truffle] Incorrect string size #4396

bjfish commented Dec 18, 2016

nirvdrum commented Dec 19, 2016

nirvdrum commented Dec 19, 2016

[Truffle] Incorrect string size #4396

[Truffle] Incorrect string size #4396

Comments

bjfish commented Dec 18, 2016

Example

Expected Behavior

Actual Behavior

nirvdrum commented Dec 19, 2016

nirvdrum commented Dec 19, 2016