Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error handling converting UTF-32 to UTF-8 is broken [9k] [lotus] #2581

Closed
PragTob opened this issue Feb 8, 2015 · 6 comments
Closed

Error handling converting UTF-32 to UTF-8 is broken [9k] [lotus] #2581

PragTob opened this issue Feb 8, 2015 · 6 comments

Comments

@PragTob
Copy link

PragTob commented Feb 8, 2015

The UTF-32 encoding seems to be broken when converting to UTF-8. The input doesn't seem to matter as long as it is not an empty string.

I have jruby-head from today:

tobi@tobi-desktop ~/github/lotus_components/utils $ ruby -v
jruby 9.0.0.0-SNAPSHOT (2.2.0p0) 2015-02-08 cc00fd4 OpenJDK 64-Bit Server VM 24.75-b04 on 1.7.0_75-b13 +jit [linux-amd6

The problem seems to be that as soon as there is a character in a string and it is converted to UTF-32 it then throws an error when converting to UTF_8

jruby-head:

jruby-head :013 > "a".encode("UTF-32")
 => "\uFEFFa" 
jruby-head :014 > "a".encode("UTF-32").encode(Encoding::UTF_8)
Encoding::InvalidByteSequenceError: "\x00\x00\xFE\xFF" on UTF-32
    from org/jruby/RubyString.java:5671:in `encode'
    from (irb):14:in `evaluate'
    from org/jruby/RubyKernel.java:1000:in `eval'
    from org/jruby/RubyKernel.java:1310:in `loop'
    from org/jruby/RubyKernel.java:1120:in `catch'
    from org/jruby/RubyKernel.java:1120:in `catch'
    from /home/tobi/.rvm/rubies/jruby-head/bin/irb:13:in `__script__'

2.2:

2.2.0 :016 > "a".encode("UTF-32")
 => "\uFEFFa" 
2.2.0 :017 > "a".encode("UTF-32").encode(Encoding::UTF_8)
 => "a"

Discovered on lotus utils

Tobi

@headius
Copy link
Member

headius commented Mar 12, 2015

Wow, that's unexpected. This logic should be using the MRI transcoding subsystem pretty much as-is.

@headius
Copy link
Member

headius commented Mar 12, 2015

Seems to be something wrong with the "dummy" encodings:

irb(main):001:0> "a".encode("UTF-32").encode(Encoding::UTF_8)
Encoding::InvalidByteSequenceError: "\x00\x00\xFE\xFF" on UTF-32
    from org/jruby/RubyString.java:5669:in `encode'
    from (irb):1:in `<eval>'
    from org/jruby/RubyKernel.java:1005:in `eval'
    from org/jruby/RubyKernel.java:1315:in `loop'
    from org/jruby/RubyKernel.java:1125:in `catch'
    from org/jruby/RubyKernel.java:1125:in `catch'
    from /Users/headius/projects/jruby/bin/jirb:13:in `<top>'
irb(main):002:0> "a".encode("UTF-32BE").encode(Encoding::UTF_8)
=> "a"
irb(main):003:0> "a".encode("UTF-32").encoding
=> #<Encoding:UTF-32 (dummy)>
irb(main):004:0> "a".encode("UTF-32BE").encoding
=> #<Encoding:UTF-32BE>

@lopex What are these dummy encodings for?

@headius
Copy link
Member

headius commented Mar 12, 2015

It appears that at some point "dummy" encodings became "replicate" encodings, so I'm trying to make that change to our encoding list too.

@headius
Copy link
Member

headius commented Mar 13, 2015

Bleh, opened a can of worms. Additional fixes coming in.

@headius
Copy link
Member

headius commented Mar 13, 2015

Multiple fixes to jcodings and I think we're back in business. Your case works and all previous passing cases work. Will explore tags/excludes now.

@headius headius closed this as completed Mar 13, 2015
@headius headius added this to the 9.0.0.0.pre2 milestone Mar 13, 2015
@PragTob
Copy link
Author

PragTob commented Mar 14, 2015

👍 Thanks a lot Charlie!

headius added a commit to jruby/jcodings that referenced this issue Mar 16, 2015
Dummy flag is used in various places, so these replicas can't be
perfect replicas. See jruby/jruby#2581.
headius added a commit to jruby/jcodings that referenced this issue Mar 16, 2015
headius added a commit that referenced this issue Mar 16, 2015
…."

This reverts commit 3f5a605.

Conflicts:
	core/pom.xml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants