Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot persist IR #3252

Closed
kylekyle opened this issue Aug 14, 2015 · 10 comments
Closed

Cannot persist IR #3252

kylekyle opened this issue Aug 14, 2015 · 10 comments

Comments

@kylekyle
Copy link

It appears that JRuby supports serializing a scope through the IRWriter class, but the persist method looks broken at the moment. I bumped into the issue while writing an extension for serializing a Proc:

public class ProcToBytesService implements BasicLibraryService {

  @Override
  public boolean basicLoad(Ruby runtime) throws IOException {
    runtime.getClass("Proc").defineAnnotatedMethod(ProcToBytes.class, "to_bytes");
    return true;
  }

  @JRubyClass(name = "ProcToBytes")
  public static class ProcToBytes {

    @JRubyMethod
    public static IRubyObject to_bytes(ThreadContext context, IRubyObject self) {
      Ruby ruby = context.getRuntime();
      Block block = ((RubyProc) self).getBlock();

      if (!(block.getBody() instanceof InterpretedIRBlockBody)) {
        throw ruby.newRuntimeError("Cannot serialize " + block.getBody().getClass().getName());
      }

      InterpretedIRBlockBody body = (InterpretedIRBlockBody) block.getBody();

      if (!(body.getIRScope() instanceof IRClosure)) {
        throw ruby.newRuntimeError("Cannot serialize " + body.getIRScope().getClass().getName());
      }

      IRClosure closure = (IRClosure) body.getIRScope();

      try {
        try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
          try (ObjectOutputStream output = new ObjectOutputStream(baos)) {
            IRWriter.persist(new IRWriterStream(output), closure);
          }

          return ruby.newString(new ByteList(baos.toByteArray()));
        }
      } catch (IOException ex) {
        throw new RuntimeException(ex);
      }
    }
  }
}

You can test the service with a trivial proc:

kylekyle$ jruby -X-C -r proc_to_bytes_service.jar -r proc_to_bytes -e "proc{puts 'hello'}.to_bytes"

But you get the following stack trace:

IRWriterAnalzer.java:169:in `getScopeID': java.lang.NullPointerException
    from IRWriterStream.java:185:in `encode'
    from IRWriter.java:86:in `persistScopeHeader'
    from IRWriter.java:61:in `persistScopeHeaders'
    from IRWriter.java:28:in `persist'
    from ProcToBytesService.java:62:in `to_bytes'
    from ProcToBytesService$ProcToBytes$INVOKER$s$0$0$to_bytes.gen:-1:in `call'
    from CachingCallSite.java:293:in `cacheAndCall'
    from CachingCallSite.java:131:in `call'
    from InterpreterEngine.java:305:in `processCall'
    from StartupInterpreterEngine.java:77:in `interpret'
    from Interpreter.java:116:in `INTERPRET_ROOT'
    from Interpreter.java:103:in `execute'
    from Interpreter.java:32:in `execute'
    from IRTranslator.java:42:in `execute'
    from Ruby.java:849:in `runInterpreter'
    from Ruby.java:854:in `runInterpreter'
    from Ruby.java:756:in `runNormally'
    from Ruby.java:573:in `runFromMain'
    from Main.java:403:in `doRunFromMain'
    from Main.java:298:in `internalRun'
    from Main.java:225:in `run'
    from Main.java:197:in `main'

Is the persist method expecting a different type of scope? Is there a different StreamWriter I should be using? Has anyone had success persisting a scope using IRWriter? Is there an easier way to serialize a JRuby proc?

@enebo
Copy link
Member

enebo commented Aug 14, 2015

I guess we need to think about this. A closure/proc cannot necessarily be persisted by itself because it likely has references to things in enclosing scopes (like IRMethod). As such, I never thought about this scenario directly. Even if things were changed so you persisted first non-closure parent scope and all nest closures then I am not sure if this works since persistence was designed around persisting IRScript or IREvalScript....

Not quite sure on this one.

@kylekyle
Copy link
Author

Not sure what the internals looked like, but it seems that @headius had this working once upon a time.

@enebo
Copy link
Member

enebo commented Aug 16, 2015

@kylekyle yeah I remember this experiment. I think this example was pretty scary in that it if anything containing it lexically was different the code would explode. I guess if we examined all constants and containing depth > 0 variables we could perhaps safe-guard reloading the persisted closure and give an error message if it was being reloaded into an invalid containing scope.

I am interested in the use-case you are looking to solve. The mechanism used in that experiment is one way and another way would be to encapsulate and persist any contained state at the same time and restoring that. The problem with encapsulating that extra state is it becomes a slippery slope since we may be persisting state of an object which would not exist in a new runtime so you would then need to persist type info (or restrict to common subset of well-defined types .. eg String/Array). I think this is why that experiment assumed you were reloading into a similar env.

I think we can do something to make closures persist but it would be cool to get more clarification on how you envision it working.

@kylekyle
Copy link
Author

I do a lot of work in Spark. In my opinion, the only thing that would make Spark better is Ruby support. Unfortunately, Spark gets its flexibility from the ability to serialize a closure (especially those defined interactively in a shell session) and send it to executors and get back immediate results.

The Ruby Spark project has made some headway toward Ruby support in Spark, but there are some pretty sharp edges since it uses Sourcify to marshal procs to Strings where it can.

A properly serializable proc would make Ruby integration with Spark's Java API nearly seamless. The closures we want to serialize are typically trivial:

# a word count example
file = context.text_file 'hdfs://.../'
file.flat_map(&:split).map{|word| [word,1]}.reduce_by_key(&:+)

None of those closures require any extra state.

I guess if we examined all constants and containing depth > 0 variables we could perhaps safe-guard reloading the persisted closure and give an error message if it was being reloaded into an invalid containing scope.

That would be perfect for what I'm trying to do.

@kylekyle
Copy link
Author

The code that supported headius' gist wouldn't happen to be lying around somewhere, would it? If I had a starting place, I could start working on a fork of the IRWriter.persist method.

@enebo
Copy link
Member

enebo commented Aug 19, 2015

Our runtime in 9k is completely different so hacking the entry point you found would be the place to start

@tomz
Copy link

tomz commented Nov 13, 2015

@kylekyle Any progress on this? - I think this will make Spark so much better and easier to use

@kylekyle
Copy link
Author

@tomz: I got a first pass finished, but it's super buggy and I'm starting to think this approach won't be viable. I think the easiest thing might just be to a Rubinius driver to serialize procs since it natively supports that. Everything else would remain JRuby.

Another alternative would be to use Truffle, which I believe (correct me if I'm wrong), might also have native support for serializing a proc. The only problem is that the standard library hasn't been completely implemented in Truffle, so a lot of things just don't work yet (like pry).

Kind of a setback, but I haven't given up yet. I'll update this ticket if there's any news.

@tomz
Copy link

tomz commented Nov 15, 2015

@kylekyle thanks for the update. I also tweeted to @jruby and @headius about this issue.

@headius
Copy link
Member

headius commented Jul 8, 2020

I believe JRuby 9.2.11.0 included a large chunk of work to make IR serialization work better. It's not officially a supported feature, but this issue should no longer be valid. Marking as fixed in 9.2.11.0.

@headius headius closed this as completed Jul 8, 2020
@headius headius added this to the JRuby 9.2.11.0 milestone Jul 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants