-
-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unmarshaled symbol has the wrong encoding #1329
Comments
I suspect this is a direct consequence of #1328 since that issue establishes that the encoding of the symbol is wrong in the first place. So it is marshalling that improperly encoded symbol just fine. |
Thanks for taking a look at this. In To simplify the issue, we could just consider the following code:
When I run it with JRuby I get "US-ASCII" and when I run it with MRI I get "UTF-8". |
Sorry I missed that you used to_sym in that example. This indeed appears to be a problem... |
I sat down and studied the marshaled data we are working with and figured out what each byte means.
(I figured all this out by studying the MRI source code. I have not looked up the official Marshal format documentation if there is one, so some parts of my description might show that ignorance.) To summarize, it looks like we have a symbol whose value consists of some UTF-8 bytes and it is marshaled as if it has one instance variable When unmarshalling the symbol in https://github.com/ruby/ruby/blob/v2_0_0_352/marshal.c#L1237-1250 It seems that there are four types of things that are recognized:
From looking at the JRuby code, I don't think it has the equivalent of MRI's @enebo, let me know if you have any objections or advice! I hope I can write a good pull request for this. I have never contributed to JRuby before. |
It looks like you have dug into this just fine and it is pretty clear how MRI works in seeing that method. It will be great if you can make a PR to correct this in our marshalling. |
I tested this again today with MRI 2.2.0p0 and JRuby 9.0.0.0.pre1 and nothing has changed. I could probably make a pull request that fixes it 99.9% of the time, but issue #1348 is still open so it would not be a complete fix. As described in issue #1348, symbols retrieved from the JRuby symbol table will sometimes have the wrong encoding, so we cannot guarantee that unmarshalled symbols will have the right encoding until that is fixed. |
I just tested it again, and this problem is still present in JRuby 9.0.0.0. |
Phew, this bug has been haunting me for the last few days. I've been using the u2f gem with the latest JRruby (9.1.7.0) and jruby-openssl. When trying to register my U2F device, I get Turns out this is essentially a catch-all exception for "couldn't parse your certificate" in the jruby openssl implementation. I setup a dummy project to test this with MRI ruby 1.9.3 and a few newer versions and everything ran smoothly. I was eventually able to get around the issue by calling Thanks @DavidEGrayson for doing the research on this. I hope this issue does get resolved once and for all eventually :) |
This appears to be working ok in 9.1.8.0 (the encoding comes back as UTF-8) and we have further fixes for encodings in Strings (see #1348, #4564). Marking this as fixed in 9.1.9.0. I'm marking this fixed as of 9.1.8.0. @anthonylebrun Sorry to hear about your trouble! If it's still happening for you with 9.1.8.0, try master (9.1.9.0). If that also breaks, open an issue and we'll look into it! |
JRuby behaves differently than MRI when it is unmarshaling symbols. The symbol always seems to have the US-ASCII encoding, even if it has special unicode characters in it.
To reproduce this, I needed two separate scripts. (It seems that the state of JRuby's symbol table affects how Marshal.load behaves.)
In
test1.rb
, I have:In
test2.rb
, I have:Here is the output I get from running these scripts, and also information about the versions of Ruby I am using:
From this we can see that both JRuby and MRI are marshaling the data in the same way, but when JRuby unmarshals it, it is setting the encoding to US-ASCII instead of UTF-8.
This issue came up because I am trying to use YARD to generate documentation for JRuby code that has special characters in a few method alias names. When I run "yard doc", the data about those methods is marshaled and written to the disk, and when I run "yard server --reload" it gets unmarshaled badly.
One workaround for this issue is to create a symbol with the proper encoding before running
Marshal.load
.Sorry if this is a duplicate. This could be related to issue with symbol literal encoding that I just reported, #1328. I also see there is another open issue about method that is probably related to symbol encoding: #914.
The text was updated successfully, but these errors were encountered: