Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multibyte identifiers not marshaled correctly #3688

Closed
headius opened this issue Feb 19, 2016 · 3 comments
Closed

Multibyte identifiers not marshaled correctly #3688

headius opened this issue Feb 19, 2016 · 3 comments

Comments

@headius
Copy link
Member

headius commented Feb 19, 2016

Environment

JRuby 9.0.5.0 (and likely 9.1.0.0)

Expected Behavior

When a class or instance variable has multibyte characters, it needs to marshal that name as though it were an encoded symbol. Basically, where we currently dump a "plain" bytes version of the identifer, we need to dump it with the "encoding instance variable" logic used for proper symbols. This way we can reconstitute the original name and match it up properly on the load side.

Actual Behavior

In actuality, we are stripping the multibyte nature of those names, and they come out mangled on the other side.

Given this script (based on MRI TestMarshal#test_unloadable_userdef:

class Foo
  def bar
    c = eval "
      class X\u{23F0 23F3} < Time
        class << self
          undef _load
        end
        self
      end"
    o = c.new
    p Marshal.load(Marshal.dump(o))
  end
end
Foo.new.bar

We currently get the following marshaled output (note the mangled name of the class):

"\x04\bIu:\rFoo::X??\rn\x06\x1D\x80P.\x04\xE7\b:\rsubmicro\"\x06\x00:\voffseti\xFE\xA0\xAB:\tzoneI\"\bCST\x06:\x06ET"

I attempted to fix this by actually creating symbols for these elements, but there's something missing on the loading side. Here's my patch: https://gist.github.com/headius/99779f55add2d007ca42

The marshaled output looks better:

"\x04\bIuI:\x11Foo::X\xE2\x8F\xB0\xE2\x8F\xB3\x06:\x06ET\rn\x06\x1D\x80=\xF5-\xC5\a:\voffseti\xFE\xA0\xAB:\tzoneI\"\bCST\x06;\x06F"

But I still get an error from the full script above:

ArgumentError: undefined class/module Foo::X��
   load at org/jruby/RubyMarshal.java:145
    bar at -e:1
  <top> at -e:1

We are not reconstituting the string properly, it seems.

Since this is not a regression or new behavior, I'm going to exclude the related CRuby 2.3 tests for now.

@headius
Copy link
Member Author

headius commented Apr 27, 2017

Circling back to this because we're fixing up symbol encoding stuff.

If I run this as-is today on 9.1.9.0 (master) I get this:

$ jruby blah.rb
ArgumentError: undefined class/module Foo::X??
    load at org/jruby/RubyMarshal.java:145
     bar at blah.rb:11
  <main> at blah.rb:14

The multibyte chars are at least rendering as question marks now, but the error remains.

If I run this without the eval (and use ) as the name of the class, the error changes to a marshaling error:

$ jruby blah.rb
TypeError: class Foo::Xø needs to have method `_load'
         load at org/jruby/RubyMarshal.java:145
  <class:Foo> at blah.rb:11
       <main> at blah.rb:1

...which matches MRI. So it seems that the problem relates to the eval of the multibyte characters.

@headius
Copy link
Member Author

headius commented Apr 27, 2017

I suspect this will be fixed by getting the rest of identifiers to use proper bytelists, which @enebo is working on now.

@headius
Copy link
Member Author

headius commented May 16, 2018

This does not appear to be fixed in 9.2, but it may simply be a lack of symbol/identifier logic fixes for Marshal. @enebo maybe you can take a look at this?

Meanwhile I'm retargeting for 9.2.1.

@headius headius modified the milestones: JRuby 9.2.0.0, JRuby 9.2.1.0 May 16, 2018
@enebo enebo closed this as completed in f176877 Jun 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant