Error handling converting UTF-32 to UTF-8 is broken [9k] [lotus] #2581

PragTob · 2015-02-08T16:37:13Z

The UTF-32 encoding seems to be broken when converting to UTF-8. The input doesn't seem to matter as long as it is not an empty string.

I have jruby-head from today:

tobi@tobi-desktop ~/github/lotus_components/utils $ ruby -v
jruby 9.0.0.0-SNAPSHOT (2.2.0p0) 2015-02-08 cc00fd4 OpenJDK 64-Bit Server VM 24.75-b04 on 1.7.0_75-b13 +jit [linux-amd6

The problem seems to be that as soon as there is a character in a string and it is converted to UTF-32 it then throws an error when converting to UTF_8

jruby-head:

jruby-head :013 > "a".encode("UTF-32")
 => "\uFEFFa" 
jruby-head :014 > "a".encode("UTF-32").encode(Encoding::UTF_8)
Encoding::InvalidByteSequenceError: "\x00\x00\xFE\xFF" on UTF-32
    from org/jruby/RubyString.java:5671:in `encode'
    from (irb):14:in `evaluate'
    from org/jruby/RubyKernel.java:1000:in `eval'
    from org/jruby/RubyKernel.java:1310:in `loop'
    from org/jruby/RubyKernel.java:1120:in `catch'
    from org/jruby/RubyKernel.java:1120:in `catch'
    from /home/tobi/.rvm/rubies/jruby-head/bin/irb:13:in `__script__'

2.2:

2.2.0 :016 > "a".encode("UTF-32")
 => "\uFEFFa" 
2.2.0 :017 > "a".encode("UTF-32").encode(Encoding::UTF_8)
 => "a"

Discovered on lotus utils

Tobi

The text was updated successfully, but these errors were encountered:

headius · 2015-03-12T22:05:49Z

Wow, that's unexpected. This logic should be using the MRI transcoding subsystem pretty much as-is.

headius · 2015-03-12T22:07:10Z

Seems to be something wrong with the "dummy" encodings:

irb(main):001:0> "a".encode("UTF-32").encode(Encoding::UTF_8)
Encoding::InvalidByteSequenceError: "\x00\x00\xFE\xFF" on UTF-32
    from org/jruby/RubyString.java:5669:in `encode'
    from (irb):1:in `<eval>'
    from org/jruby/RubyKernel.java:1005:in `eval'
    from org/jruby/RubyKernel.java:1315:in `loop'
    from org/jruby/RubyKernel.java:1125:in `catch'
    from org/jruby/RubyKernel.java:1125:in `catch'
    from /Users/headius/projects/jruby/bin/jirb:13:in `<top>'
irb(main):002:0> "a".encode("UTF-32BE").encode(Encoding::UTF_8)
=> "a"
irb(main):003:0> "a".encode("UTF-32").encoding
=> #<Encoding:UTF-32 (dummy)>
irb(main):004:0> "a".encode("UTF-32BE").encoding
=> #<Encoding:UTF-32BE>

@lopex What are these dummy encodings for?

headius · 2015-03-12T22:12:00Z

It appears that at some point "dummy" encodings became "replicate" encodings, so I'm trying to make that change to our encoding list too.

headius · 2015-03-13T15:17:31Z

Bleh, opened a can of worms. Additional fixes coming in.

See #2581.

headius · 2015-03-13T16:20:04Z

Multiple fixes to jcodings and I think we're back in business. Your case works and all previous passing cases work. Will explore tags/excludes now.

PragTob · 2015-03-14T10:55:48Z

👍 Thanks a lot Charlie!

Dummy flag is used in various places, so these replicas can't be perfect replicas. See jruby/jruby#2581.

Relates to jruby/jruby#2581.

See #2581.

…." This reverts commit 3f5a605. Conflicts: core/pom.xml

PragTob mentioned this issue Feb 8, 2015

Simplify travis (jruby-head runs 2.2 compatible by default) hanami/utils#59

Merged

subbuss added the JRuby 9000 label Feb 16, 2015

headius closed this as completed in 3f5a605 Mar 12, 2015

headius reopened this Mar 13, 2015

headius added a commit that referenced this issue Mar 13, 2015

Compat fixes for String#inspect wrt dummy UTF encodings.

912e77d

See #2581.

headius closed this as completed Mar 13, 2015

headius added this to the 9.0.0.0.pre2 milestone Mar 13, 2015

headius added a commit to jruby/jcodings that referenced this issue Mar 16, 2015

Dummy UTF-32 and UTF-16 need to be replicas with dummy flag.

4bcfdc3

Dummy flag is used in various places, so these replicas can't be perfect replicas. See jruby/jruby#2581.

headius added a commit to jruby/jcodings that referenced this issue Mar 16, 2015

Typos in port of UTF-32 input handling.

042076b

Relates to jruby/jruby#2581.

headius added a commit that referenced this issue Mar 16, 2015

Compat fixes for String#inspect wrt dummy UTF encodings.

af940b1

See #2581.

headius added a commit that referenced this issue Mar 16, 2015

Revert "Fix jcodings mapping for UTF-32 and UTF-16 (to BE). Fixes #2581…

24ba95a

…." This reverts commit 3f5a605. Conflicts: core/pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Sponsors

Error handling converting UTF-32 to UTF-8 is broken [9k] [lotus] #2581

Error handling converting UTF-32 to UTF-8 is broken [9k] [lotus] #2581

PragTob commented Feb 8, 2015

headius commented Mar 12, 2015

headius commented Mar 12, 2015

headius commented Mar 12, 2015

headius commented Mar 13, 2015

headius commented Mar 13, 2015

PragTob commented Mar 14, 2015

Error handling converting UTF-32 to UTF-8 is broken [9k] [lotus] #2581

Error handling converting UTF-32 to UTF-8 is broken [9k] [lotus] #2581

Comments

PragTob commented Feb 8, 2015

headius commented Mar 12, 2015

headius commented Mar 12, 2015

headius commented Mar 12, 2015

headius commented Mar 13, 2015

headius commented Mar 13, 2015

PragTob commented Mar 14, 2015