JRuby 1.7.26: String#encoding returns wrong encoding? #4452

rovf · 2017-01-24T06:58:18Z

jruby 1.7.26 (1.9.3p551) 2016-08-26 69763b8 on Java HotSpot(TM) 64-Bit Server VM 1.7.0_79-b15 +jit [Windows 7-amd64]

In irb on Windows, I tried

'abc'.encoding

and got as answer

#Encoding:Windows-1252

This already came as surprise, because I thought that strings, by default, would be UTF-8. Well, next I tried a string containing a single Japanese character:

'う'.encoding

Here too, I got #Encoding:Windows-1252 as a response - and this for sure can't be true, because an う can't be represented in the Windows-1252 character set.

I think, the wrong encoding object is returned from the string.

The text was updated successfully, but these errors were encountered:

headius · 2017-01-24T20:36:12Z

You would want to compare this to CRuby 1.9.3. I believe the change to make all strings be UTF-8 by default did not arrive until at least Ruby 2.0, and possibly later than that.

If this works the same as CRuby 1.9.3, there's nothing to fix. Hopefully it works properly in JRuby 9.1.7.0, since that is the maintained line. Can you confirm both of those?

rovf · 2017-01-25T08:05:04Z

Actually, the feature to have each String contain its own encoding, was introduced in CRuby 1.9. See for instance http://nuclearsquid.com/writings/ruby-1-9-encodings/ and http://ruby-doc.org/core-1.9.3/String.html#method-i-encoding . I don't think that the bug is about using the wrong encoding internally - otherwise using i.e. japanese characters in my string wouldn't work correctly -, but that String#encoding does not return the correct information. It would be helpful to compare this with CRuby 1.9.3, but I don't have one here.

BTW, I tried the same with jruby-9.0.4.0 (also on Windows) and got the same behaviour, but since we, unfortunately, have to work with JRuby 1.7, I'm also interested in getting it fixed for this version.

enebo · 2017-01-26T15:47:47Z

@rovf I think you misunderstood what headius said. He was not saying m17n was added at Ruby 2.0 but that the default encoding for strings being UTF-8 was at Ruby 2.0 (and for some reason I thought it was 2.1). So MRI 1.9.3 which 1.7 is emulating may end up defaulting to Windows-1252 for ordinary ASCII values. I guess we need someone to find a windows install of 1.9.3 and verify what it should be.

For the Japanese character, I do find this really odd. I know there is magic on windows for translating output to the console to CP-1252 but I find it weird it is reporting that encoding back.

@rovf I have a second request...can you put that code into a file and run the file instead of doing this via irb. I am just curious if this behavior is different.

headius · 2017-01-26T22:38:49Z

@enebo The UTF-8 default change was 2.x, I don't remember...but I just wanted to point out that it definitely wasn't 1.9 (and therefore we may have correct behavior here).

@rovf Can you try JRuby 9.1.7.0? JRuby 9.0.4.0 is over a year old at this point.

rovf · 2017-01-27T09:07:35Z

@enebo: You were right! When running from a file, both 1.7 and 9.x correctly show UTF-8 for both cases, so it really is only an irb issue!

Should this be a bug against irb? In this case, I would install JRuby 9.1.7.0 as @headius suggested; but if you think that within irb, the behaviour is OK, we can close the issue.

enebo · 2017-01-27T19:10:24Z

@rovf I don't know if there is an irb-related issue or an issue with our implementation when doing output to a console window in 1.7.x. However 1.7.x is destined to be shutdown development wise pretty soon so I recommend trying stuff with 9.x. and seeing if you still have issues.

headius · 2020-07-18T03:55:59Z

No updates against 9.x so I'm closing this as invalid.

If there's still issues here, please open a new bug.

kares added this to the Won't Fix milestone Jun 23, 2017

kares added encoding JRuby 1.7.x labels Jun 23, 2017

headius closed this as completed Jul 18, 2020

headius modified the milestones: Won't Fix, Invalid or Duplicate Jul 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JRuby 1.7.26: String#encoding returns wrong encoding? #4452

JRuby 1.7.26: String#encoding returns wrong encoding? #4452

rovf commented Jan 24, 2017

headius commented Jan 24, 2017

rovf commented Jan 25, 2017

enebo commented Jan 26, 2017

headius commented Jan 26, 2017 •

edited

rovf commented Jan 27, 2017

enebo commented Jan 27, 2017

headius commented Jul 18, 2020

JRuby 1.7.26: String#encoding returns wrong encoding? #4452

JRuby 1.7.26: String#encoding returns wrong encoding? #4452

Comments

rovf commented Jan 24, 2017

headius commented Jan 24, 2017

rovf commented Jan 25, 2017

enebo commented Jan 26, 2017

headius commented Jan 26, 2017 • edited

rovf commented Jan 27, 2017

enebo commented Jan 27, 2017

headius commented Jul 18, 2020

headius commented Jan 26, 2017 •

edited