You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I found that JRuby mangles the string encoding when writing to $stdout. I discovered the issue in v1.7.4 but it also exists in v1.7.11 which I just updated to in an attempt to fix the issue.
I'm still trying to put together a small test which causes the issue which is a bit challenging because all our existing small tests for this sort of issue do pass.
Filling in what I have so far, though, my script is pretty simple:
puts "Copyright \u00A9"
My writer is essentially a StringWriter. Actually it's a DocumentWriter, but that probably doesn't matter. In any case, it's a Writer, so I definitely don't expect to be subject to encoding issues like this.
What I actually get on the writer is:
Copyright ��
Adding #encoding: UTF-8 to the top makes no difference. This also only occurs with file.encoding set to something other than UTF-8 (so for practical purposes, only Windows is affected.)
What I can see in the debugger:
RubyIO#write():1408 has str set correctly. It then calls getByteList() and receives the correct UTF-8 bytes.
The bytes travel unharmed through ChannelStream, ChannelDescriptor, Channels$WritableByteChannelImpl, arriving at a PrintStream.
This PrintStream contains a WriterOutputStream with the encoding set to US-ASCII.
So somewhere in JRuby, a WriterOutputStream is being created with the wrong encoding, thus mangling my bytes on the way back to characters.
Edit:
The issue seems to be that Utils.getRubyIO is creating the WriterOutputStream without specifying the encoding. So even though the correct bytes are written to the OutputStream, this writer then corrupts them on the way back to the Writer I passed in.
Now that I have looked at our existing tests carefully, I see that the passing test we had was really an expected-failing test. So there might be a report about this, perhaps even from me, on one of the trackers already.
I found that JRuby mangles the string encoding when writing to
$stdout
. I discovered the issue in v1.7.4 but it also exists in v1.7.11 which I just updated to in an attempt to fix the issue.I'm still trying to put together a small test which causes the issue which is a bit challenging because all our existing small tests for this sort of issue do pass.
Filling in what I have so far, though, my script is pretty simple:
My writer is essentially a StringWriter. Actually it's a DocumentWriter, but that probably doesn't matter. In any case, it's a Writer, so I definitely don't expect to be subject to encoding issues like this.
What I actually get on the writer is:
Adding
#encoding: UTF-8
to the top makes no difference. This also only occurs with file.encoding set to something other than UTF-8 (so for practical purposes, only Windows is affected.)What I can see in the debugger:
So somewhere in JRuby, a WriterOutputStream is being created with the wrong encoding, thus mangling my bytes on the way back to characters.
Edit:
The issue seems to be that Utils.getRubyIO is creating the WriterOutputStream without specifying the encoding. So even though the correct bytes are written to the OutputStream, this writer then corrupts them on the way back to the Writer I passed in.
Test cases were simpler than expected.
The text was updated successfully, but these errors were encountered: