Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Truffle] Decode the path parameter to open #3202

Closed
wants to merge 1 commit into from

Conversation

bbelleville
Copy link
Contributor

This allows the path to contain characters not represented by
ISO-8859-1, which is the default encoding used for ByteLists.

Here is a test case to expose this problem:

filename = "こんにちは.txt"
File.open(filename, "w").close
File.delete(filename)

This results in an error: "No such file or directory". Delete (unlink) does do the decoding, but always assumes utf-8. Calling toString will use the encoding specified by the backing ByteList. It seems like some functions decode utf-8 always, others call toString.

This allows the path to contain characters not represented by
ISO-8859-1, which is the default encoding used for ByteLists.
@chrisseaton
Copy link
Contributor

How did you find this problem? Do file paths have an encoding? Aren't they just bytes?

Your solution isn't ideal - we don't want to call toString on RubyBasicObject as these classes are going away and it's not totally static anyway. Even if we did use it, we'd need @TruffleBoundary as we don't want to PE the entire encoding logic path.

We can fix that easily, but I'm still not sure what the problem is here or why this solves it. Can you elaborate?

@chrisseaton chrisseaton added this to the truffe milestone Jul 31, 2015
@chrisseaton chrisseaton self-assigned this Jul 31, 2015
@bbelleville
Copy link
Contributor Author

I found this problem while debugging another failure. The failure was in chdir_spec, and the reason it was failing was because of the way the runtime of the AOT compilation system was handling conversion of a C string to a Java string. In DirSpecs.create_mock_dirs it creates a file with this name.

To the operating system the filenames are just bytes, but if you have a ruby string representing that filename, you need to make sure to always send the same bytes to the operating system. The issue is that, as the code is, if you create a file using File.open named "こんにちは.txt", the bytes sent to the operating system don't correspond to "こんにちは.txt" in any possible character encoding. This is because the ByteList is being passed as a CharSequence, and it appears that jnr-ffi calls toString on that (that is what I do in the AOT system, and get the exact same results). The ByteList toString method decodes the bytes as ISO-8859-1, which doesn't support Japanese characters. The resulting java String is then encoded as bytes to create a C string, probably using whatever is the default encoding on the platform. This is what the operating system will see, on my system that results in a file named: "ã??ã??ã?«ã?¡ã?¯.txt"

I didn't think the solution I have is ideal, but I wanted to make you aware of the problem, since it was only by chance that I stumbled upon it.

@eregon
Copy link
Member

eregon commented Aug 1, 2015

Maybe ByteList.toString() is to blame here since it has the encoding?
Anyway, treating a ByteList as a CharSequence is not going to work well if there is no or arbitrary transcoding not respecting the ByteList encoding (like ByteList.toString()) happening.

@eregon
Copy link
Member

eregon commented Aug 1, 2015

JRuby's RubyString.toString() ends in Helpers.decodeByteList() which does the right thing.
Therefore I think we should have StringNodes.toString() which does the same and use that (Object.toString() was practical but won't work once everything is DynamicObject I suppose).

@chrisseaton
Copy link
Contributor

Ok, I'll still don't quite get it. But can you add an @TruffleBoundary and a comment explaining what is going on and we'll merge.

I thought JNR would be doing the right thing with a ByteList automatically.

@eregon
Copy link
Member

eregon commented Aug 1, 2015

@chrisseaton I think it is as simple as ByteList.toString() just ignoring the ByteList encoding and do some irrelevant transcoding. This is probably worth discussing over to potentially fix ByteList.toString().

@eregon eregon closed this in ddcadaf Aug 2, 2015
@eregon
Copy link
Member

eregon commented Aug 2, 2015

I fixed it in ddcadaf.

@bbelleville bbelleville deleted the open branch August 6, 2015 18:07
@chrisseaton chrisseaton modified the milestones: truffe, truffle-dev Dec 19, 2015
@enebo enebo added this to the Non-Release milestone Dec 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants