Cannot persist IR #3252

kylekyle · 2015-08-14T05:42:01Z

It appears that JRuby supports serializing a scope through the IRWriter class, but the persist method looks broken at the moment. I bumped into the issue while writing an extension for serializing a Proc:

public class ProcToBytesService implements BasicLibraryService {

  @Override
  public boolean basicLoad(Ruby runtime) throws IOException {
    runtime.getClass("Proc").defineAnnotatedMethod(ProcToBytes.class, "to_bytes");
    return true;
  }

  @JRubyClass(name = "ProcToBytes")
  public static class ProcToBytes {

    @JRubyMethod
    public static IRubyObject to_bytes(ThreadContext context, IRubyObject self) {
      Ruby ruby = context.getRuntime();
      Block block = ((RubyProc) self).getBlock();

      if (!(block.getBody() instanceof InterpretedIRBlockBody)) {
        throw ruby.newRuntimeError("Cannot serialize " + block.getBody().getClass().getName());
      }

      InterpretedIRBlockBody body = (InterpretedIRBlockBody) block.getBody();

      if (!(body.getIRScope() instanceof IRClosure)) {
        throw ruby.newRuntimeError("Cannot serialize " + body.getIRScope().getClass().getName());
      }

      IRClosure closure = (IRClosure) body.getIRScope();

      try {
        try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
          try (ObjectOutputStream output = new ObjectOutputStream(baos)) {
            IRWriter.persist(new IRWriterStream(output), closure);
          }

          return ruby.newString(new ByteList(baos.toByteArray()));
        }
      } catch (IOException ex) {
        throw new RuntimeException(ex);
      }
    }
  }
}

You can test the service with a trivial proc:

kylekyle$ jruby -X-C -r proc_to_bytes_service.jar -r proc_to_bytes -e "proc{puts 'hello'}.to_bytes"

But you get the following stack trace:

IRWriterAnalzer.java:169:in `getScopeID': java.lang.NullPointerException
    from IRWriterStream.java:185:in `encode'
    from IRWriter.java:86:in `persistScopeHeader'
    from IRWriter.java:61:in `persistScopeHeaders'
    from IRWriter.java:28:in `persist'
    from ProcToBytesService.java:62:in `to_bytes'
    from ProcToBytesService$ProcToBytes$INVOKER$s$0$0$to_bytes.gen:-1:in `call'
    from CachingCallSite.java:293:in `cacheAndCall'
    from CachingCallSite.java:131:in `call'
    from InterpreterEngine.java:305:in `processCall'
    from StartupInterpreterEngine.java:77:in `interpret'
    from Interpreter.java:116:in `INTERPRET_ROOT'
    from Interpreter.java:103:in `execute'
    from Interpreter.java:32:in `execute'
    from IRTranslator.java:42:in `execute'
    from Ruby.java:849:in `runInterpreter'
    from Ruby.java:854:in `runInterpreter'
    from Ruby.java:756:in `runNormally'
    from Ruby.java:573:in `runFromMain'
    from Main.java:403:in `doRunFromMain'
    from Main.java:298:in `internalRun'
    from Main.java:225:in `run'
    from Main.java:197:in `main'

Is the persist method expecting a different type of scope? Is there a different StreamWriter I should be using? Has anyone had success persisting a scope using IRWriter? Is there an easier way to serialize a JRuby proc?

The text was updated successfully, but these errors were encountered:

enebo · 2015-08-14T16:22:41Z

I guess we need to think about this. A closure/proc cannot necessarily be persisted by itself because it likely has references to things in enclosing scopes (like IRMethod). As such, I never thought about this scenario directly. Even if things were changed so you persisted first non-closure parent scope and all nest closures then I am not sure if this works since persistence was designed around persisting IRScript or IREvalScript....

Not quite sure on this one.

kylekyle · 2015-08-16T20:04:39Z

Not sure what the internals looked like, but it seems that @headius had this working once upon a time.

enebo · 2015-08-16T20:42:37Z

@kylekyle yeah I remember this experiment. I think this example was pretty scary in that it if anything containing it lexically was different the code would explode. I guess if we examined all constants and containing depth > 0 variables we could perhaps safe-guard reloading the persisted closure and give an error message if it was being reloaded into an invalid containing scope.

I am interested in the use-case you are looking to solve. The mechanism used in that experiment is one way and another way would be to encapsulate and persist any contained state at the same time and restoring that. The problem with encapsulating that extra state is it becomes a slippery slope since we may be persisting state of an object which would not exist in a new runtime so you would then need to persist type info (or restrict to common subset of well-defined types .. eg String/Array). I think this is why that experiment assumed you were reloading into a similar env.

I think we can do something to make closures persist but it would be cool to get more clarification on how you envision it working.

kylekyle · 2015-08-16T22:40:26Z

I do a lot of work in Spark. In my opinion, the only thing that would make Spark better is Ruby support. Unfortunately, Spark gets its flexibility from the ability to serialize a closure (especially those defined interactively in a shell session) and send it to executors and get back immediate results.

The Ruby Spark project has made some headway toward Ruby support in Spark, but there are some pretty sharp edges since it uses Sourcify to marshal procs to Strings where it can.

A properly serializable proc would make Ruby integration with Spark's Java API nearly seamless. The closures we want to serialize are typically trivial:

# a word count example
file = context.text_file 'hdfs://.../'
file.flat_map(&:split).map{|word| [word,1]}.reduce_by_key(&:+)

None of those closures require any extra state.

I guess if we examined all constants and containing depth > 0 variables we could perhaps safe-guard reloading the persisted closure and give an error message if it was being reloaded into an invalid containing scope.

That would be perfect for what I'm trying to do.

kylekyle · 2015-08-19T21:56:58Z

The code that supported headius' gist wouldn't happen to be lying around somewhere, would it? If I had a starting place, I could start working on a fork of the IRWriter.persist method.

enebo · 2015-08-19T22:46:02Z

Our runtime in 9k is completely different so hacking the entry point you found would be the place to start

tomz · 2015-11-13T07:41:34Z

@kylekyle Any progress on this? - I think this will make Spark so much better and easier to use

kylekyle · 2015-11-14T02:59:49Z

@tomz: I got a first pass finished, but it's super buggy and I'm starting to think this approach won't be viable. I think the easiest thing might just be to a Rubinius driver to serialize procs since it natively supports that. Everything else would remain JRuby.

Another alternative would be to use Truffle, which I believe (correct me if I'm wrong), might also have native support for serializing a proc. The only problem is that the standard library hasn't been completely implemented in Truffle, so a lot of things just don't work yet (like pry).

Kind of a setback, but I haven't given up yet. I'll update this ticket if there's any news.

tomz · 2015-11-15T02:22:46Z

@kylekyle thanks for the update. I also tweeted to @jruby and @headius about this issue.

headius · 2020-07-08T22:16:59Z

I believe JRuby 9.2.11.0 included a large chunk of work to make IR serialization work better. It's not officially a supported feature, but this issue should no longer be valid. Marking as fixed in 9.2.11.0.

headius closed this as completed Jul 8, 2020

headius added this to the JRuby 9.2.11.0 milestone Jul 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Sponsors

Cannot persist IR #3252

Cannot persist IR #3252

kylekyle commented Aug 14, 2015

enebo commented Aug 14, 2015

kylekyle commented Aug 16, 2015

enebo commented Aug 16, 2015

kylekyle commented Aug 16, 2015

kylekyle commented Aug 19, 2015

enebo commented Aug 19, 2015

tomz commented Nov 13, 2015

kylekyle commented Nov 14, 2015

tomz commented Nov 15, 2015

headius commented Jul 8, 2020

Cannot persist IR #3252

Cannot persist IR #3252

Comments

kylekyle commented Aug 14, 2015

enebo commented Aug 14, 2015

kylekyle commented Aug 16, 2015

enebo commented Aug 16, 2015

kylekyle commented Aug 16, 2015

kylekyle commented Aug 19, 2015

enebo commented Aug 19, 2015

tomz commented Nov 13, 2015

kylekyle commented Nov 14, 2015

tomz commented Nov 15, 2015

headius commented Jul 8, 2020