New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JRuby 1.7.26: String#encoding returns wrong encoding? #4452
Comments
You would want to compare this to CRuby 1.9.3. I believe the change to make all strings be UTF-8 by default did not arrive until at least Ruby 2.0, and possibly later than that. If this works the same as CRuby 1.9.3, there's nothing to fix. Hopefully it works properly in JRuby 9.1.7.0, since that is the maintained line. Can you confirm both of those? |
Actually, the feature to have each String contain its own encoding, was introduced in CRuby 1.9. See for instance http://nuclearsquid.com/writings/ruby-1-9-encodings/ and http://ruby-doc.org/core-1.9.3/String.html#method-i-encoding . I don't think that the bug is about using the wrong encoding internally - otherwise using i.e. japanese characters in my string wouldn't work correctly -, but that String#encoding does not return the correct information. It would be helpful to compare this with CRuby 1.9.3, but I don't have one here. BTW, I tried the same with jruby-9.0.4.0 (also on Windows) and got the same behaviour, but since we, unfortunately, have to work with JRuby 1.7, I'm also interested in getting it fixed for this version. |
@rovf I think you misunderstood what headius said. He was not saying m17n was added at Ruby 2.0 but that the default encoding for strings being UTF-8 was at Ruby 2.0 (and for some reason I thought it was 2.1). So MRI 1.9.3 which 1.7 is emulating may end up defaulting to Windows-1252 for ordinary ASCII values. I guess we need someone to find a windows install of 1.9.3 and verify what it should be. For the Japanese character, I do find this really odd. I know there is magic on windows for translating output to the console to CP-1252 but I find it weird it is reporting that encoding back. @rovf I have a second request...can you put that code into a file and run the file instead of doing this via irb. I am just curious if this behavior is different. |
@enebo: You were right! When running from a file, both 1.7 and 9.x correctly show UTF-8 for both cases, so it really is only an irb issue! Should this be a bug against irb? In this case, I would install JRuby 9.1.7.0 as @headius suggested; but if you think that within irb, the behaviour is OK, we can close the issue. |
@rovf I don't know if there is an irb-related issue or an issue with our implementation when doing output to a console window in 1.7.x. However 1.7.x is destined to be shutdown development wise pretty soon so I recommend trying stuff with 9.x. and seeing if you still have issues. |
No updates against 9.x so I'm closing this as invalid. If there's still issues here, please open a new bug. |
jruby 1.7.26 (1.9.3p551) 2016-08-26 69763b8 on Java HotSpot(TM) 64-Bit Server VM 1.7.0_79-b15 +jit [Windows 7-amd64]
In irb on Windows, I tried
'abc'.encoding
and got as answer
#Encoding:Windows-1252
This already came as surprise, because I thought that strings, by default, would be UTF-8. Well, next I tried a string containing a single Japanese character:
'う'.encoding
Here too, I got #Encoding:Windows-1252 as a response - and this for sure can't be true, because an う can't be represented in the Windows-1252 character set.
I think, the wrong encoding object is returned from the string.
The text was updated successfully, but these errors were encountered: