-
-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode characters are lost when embedding JRuby even if the calling code performs no conversion to byte[] #2403
Comments
I added this test on JRuby 1.7 HEAD and master HEAD and it passed in both places. I also confirmed that file.encoding defaults to Cp1252. Am I doing something wrong? |
Issue still occurs in 1.7.13 (what I had on my home computer when I got home), 1.7.19 (downloaded now) and 9.0.0.0.pre1 (downloaded now). I verified all three on OSX and specified the -Dfile.encoding system property on the command-line. The full command-lines I used:
Of course in all cases, removing the file.encoding parameter makes the test pass. Admittedly I had to fix compilation of the test itself as well, so here it is for completeness:
|
Managed to build trunk, same deal:
|
Verifying that this not only still occurs on 9.0.5.0, but now it occurs for me on OSX under IDEA 15, so one of those or something else updated and now it occurs in a new situation. Maybe this means others can now confirm the issue, so perhaps the chance of getting a fix will increase? Current content of our test case:
The failure looks like this:
The test is apparently running with -Dfile.encoding=US-ASCII so IDEA refuses to show the text. If I change it to UTF-8 then it passes, so I can't get sensible output which shows the expected text. But you can look at the test case and see that I'm not expecting question marks, at least. If you comment out the "Wrong result returned" check, it fails on "Wrong result printed". |
In the debugger, in
So it has already been destroyed at parse-time, explaining why both the result and the output are wrong. |
Ruby.java line 2750 appears to be the culprit:
Call to Well, I can't figure out how this was supposed to work, but basically by the time |
@trejkaz You may be right about the culprit here. Sorry this one slipped through the cracks. We do use |
I have a fix for the bytes going in, but the output being returned is still getting mangled. |
Ok, two fixes for this one:
Fix coming. @trejkaz Can we incorporate your test case into our suite? If you have any more you'd like to add, we'd really appreciate it! |
* Incoming scripts should be decoded from String to byte[] using default internal encoding, rather than trusting JDK's file.encoding to be appropriate. * Outgoing streams wrapping writers should decode strings based on current default internal encoding. Fixes #2403
Feel free to include that test in the JRuby suite. It will be good for us as it stops a regression creeping in. As far as others... I just had a look through all our tests and there is nothing else about encoding in our suite, other than a second nearly identical test which I'll probably remove now that I have found it. Thanks for the quick fix! :D |
The following tests fail with -Dfile.encoding=windows-1252 but pass with -Dfile.encoding=UTF-8 :
import java.io.StringWriter;
import java.io.Writer;
Most likely, the failure output you get will be confusing as well:
The "Expected" line is "?????" because Java is encoding the output as windows-1252.
The "but" line is "?????" because JRuby has encoded the strings to windows-1252 internally and then written and returned the question marks. I find it particularly odd that it would do this, both because the script is passed as a string directly from Java in the first place, but also because the script itself clearly says the strings are UTF-8.
This was JRUBY-4890 on the old tracker. The script had to be updated a bit to have a
#encoding: UTF-8
directive because JRuby now complains if you omit it.The text was updated successfully, but these errors were encountered: