Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Truffle] VM unexpectedly "cold" running the same methods (benchmarking) #3387

Closed
PragTob opened this issue Oct 11, 2015 · 4 comments
Closed
Assignees
Milestone

Comments

@PragTob
Copy link

PragTob commented Oct 11, 2015

Me again ;)

At rubykon (currently still on a branch) I started writing a very simple benchmarking tool to benchmark the final use case, which is expected to take multiple seconds to finish. Its code lives in lib/benchmark/avg. It runs the given block several times (until a time threshold is exceeded) to warm up and then runs it until a time threshold is reached again ("real" benchmark). I currently have debug output that prints the times of individual samples, which made me notice that after the warmup truffle is basically as slow as it was at the beginning of the warmup.

This report is as of this commit

The behavior I'm seeing is the following:

tobi@tobi-desktop ~/github/rubykon $ jruby+truffle run --graal -J-Xmx1500m benchmark/mcts_avg.rb 
$ JAVACMD=../graalvm-jdk1.8.0/bin/java ../jruby/bin/jruby -X\+T -J-Xmx1500m -Xtruffle.core.load_path\=../jruby/truffle/src/main/ruby -r ./.jruby\+truffle_bundle/bundler/setup.rb benchmark/mcts_avg.rb
Warming up...
--------------------------------------------------------------------------------
[96.148, 16.649]
19x19 1_000 iterations        1.06 i/min  56.4 s (avg) (± 70.48%)

Running the benchmark...
--------------------------------------------------------------------------------
[95.89, 20.925]
19x19 1_000 iterations        1.03 i/min  58.41 s (avg) (± 64.17%)

If you look at the arrays printed there for debug, those are the sample times in seconds. It starts at 96 seconds in the warmup, then 16s. When it goes to the "real" benchmark it starts at 95s again, as if the VM were cold. Overall it even performs worse in the "real" benchmark than the warmup, in this particular instance.

The code calls the same methods and calls the same block, which is what baffles me. I.e. benchmarking always eventually ends up at:

      def measure_block(samples, to_benchmark)
        start = Time.now
        to_benchmark.call
        finish = Time.now
        samples << (finish - start)
      end

The block called here runs a full Monte Carlo Tree Search with 1_000 playouts, so that is also already really repetitive.

Even the entry point for warmup and benchmark is the same though (type passed in is :warmup or :time):

      def run(type)
        puts BENCHMARK_DESCRIPTION[type]
        divider
        time = @options[type]
        run_benchmarks(time)
        puts
      end

The only difference is the banner printed to STDOUT and the total time it gets to perform computations (I also tried using the same time, same result).

I'm aware that this behavior might be triggered by the same thing that @chrisseaton patched in the benchmark-ips gem, but am unsure what to do about it as it really baffles me. My understanding is probably not good enough.

Things that seem to make it better is running the 9x9 benchmark along with it (and not just the 19x19 benchmark):

--------------------------------------------------------------------------------
[43.902, 15.013, 6.463, 9.926, 8.11, 12.158, 2.926, 2.545, 2.488, 2.747, 1.48, 1.361, 1.188]
9x9 1_000 iterations          7.07 i/min  8.49 s (avg) (± 130.95%)
[142.772]
19x19 1_000 iterations        0.42 i/min  142.77 s (avg) (± 0.0%)

Running the benchmark...
--------------------------------------------------------------------------------
[1.82, 1.554, 1.387, 1.813, 1.626, 1.343, 1.454, 1.985, 4.712, 5.154, 1.499, 1.054, 1.282, 1.305, 1.521, 1.034, 1.276, 1.172, 1.4489999999999998, 1.053, 0.914, 0.891, 0.848, 0.8470000000000001, 0.863, 0.884, 0.875, 2.248, 5.655, 5.729, 5.742, 5.735, 4.191, 4.296, 4.525, 5.443, 5.186, 4.338, 5.028, 4.104]
9x9 1_000 iterations          23.11 i/min  2.6 s (avg) (± 69.98%)
[31.766, 12.041, 7.499, 6.619, 5.739, 6.918, 18.809, 14.977]
19x19 1_000 iterations        4.6 i/min  13.05 s (avg) (± 63.64%)

What also makes it better is just running the benchmarking/time phase twice:

tobi@tobi-desktop ~/github/rubykon $ jruby+truffle run --graal -J-Xmx1500m benchmark/mcts_avg.rb 
$ JAVACMD=../graalvm-jdk1.8.0/bin/java ../jruby/bin/jruby -X\+T -J-Xmx1500m -Xtruffle.core.load_path\=../jruby/truffle/src/main/ruby -r ./.jruby\+truffle_bundle/bundler/setup.rb benchmark/mcts_avg.rb
Warming up...
--------------------------------------------------------------------------------
[101.404, 13.719]
19x19 1_000 iterations        1.04 i/min  57.56 s (avg) (± 76.17%)

Running the benchmark...
--------------------------------------------------------------------------------
[139.456]
19x19 1_000 iterations        0.43 i/min  139.46 s (avg) (± 0.0%)

Running the benchmark...
--------------------------------------------------------------------------------
[27.285, 20.432, 10.405, 9.472, 10.443, 10.77, 10.318, 6.666]
19x19 1_000 iterations        4.54 i/min  13.22 s (avg) (± 49.07%)

What to do here? Not sure. A hint what else I could do to make it work with warmup would be nice, or well making the VM not get cold here, although that's probably way more complicated than I can imagine :)

Thanks for all the work!
Tobi

@chrisseaton chrisseaton added this to the truffle-dev milestone Oct 11, 2015
@chrisseaton chrisseaton self-assigned this Oct 11, 2015
@chrisseaton
Copy link
Contributor

In general terms, the problem here is the same as before. You're running the interpreter in one state while warming up, and then in another state while measuring. The key method you are both warming up and measuring is the same, but you still warm up, do some other stuff like print to the screen, and then start measuring. That bit between does things that changes the state of the interpreter.

Running with -J-G:+TraceTruffleCompilation -J-G:+TraceTruffleTransferToInterpreter -J-G:+TraceTruffleAssumptions (https://gist.github.com/chrisseaton/e312438e4dc3c92fbdf0), you can see that something causes a lot of deoptimisation right after the warmup and right before measurement (https://gist.github.com/chrisseaton/e312438e4dc3c92fbdf0#file-gistfile1-txt-L416-L457). It all seems to revolve around Numeric#eql? from Rubinius (https://github.com/jruby/jruby/blob/master/truffle/src/main/ruby/core/rubinius/common/numeric.rb#L48). This is rewritten, which causes an avalanche of deoptimisations as it is inlined into so many things.

I didn't go far enough to figure out exactly what is causing Numeric#eql? to rewrite, but I can see that the type of local variables is changing, so it looks like something you run between warmup and measurement (probably within a method, within a method, within something in the core library etc) happens to call Numeric#eql? with a new type, maybe Bignum or something, or a long Fixnum instead of an int one, and that's where it all starts to go wrong.

Normally for key routines like that we always 'split', or generate a new copy of the method for each call site, but we don't do that for Rubinius code at the moment.

You are seeing one downside of aggressive optimisation, in that when the program changes the way it runs everything can unravel for a while. The key problem is that you are causing everything to unravel right as you start measuring.

How about run a single loop, measuring the time of iterations for say 2 minutes, so you get an array of samples. Then, remove the first 60s of samples. Then warmup and iteration is one unbroken action with no upheaval between.

@PragTob
Copy link
Author

PragTob commented Oct 12, 2015

Thanks for the explanation @chrisseaton !

It seems like/I bet it is the printing of results - Various number comparisons are invoked there (just like over at benchmark-ips), rounded and then printed out. It should just be additions, divisions etc. so no idea why #eql? is called but it probably happens somewhere there. code.

Thanks!

Should I leave this open for further investigation on the Truffle said or close because I got the answer I needed and can now work around it?

@chrisseaton
Copy link
Contributor

I can't see exactly what is calling #eql?, but in the end it doesn't really matter. If we fixed that there might be something else that gets rewritten by the code between warmup and measurement.

I'll close this, just with the same advice that any code between warmup and measurement is bad. I think most experienced VM researchers you talk to would agree with that. I don't think we want to 'fix' anything in JRuby+Truffle here unfortunately.

@eregon
Copy link
Member

eregon commented Oct 28, 2015

Just for info, extract_times creates a Hash and adds multiple pairs to it, that calls eql? on the keys.

@enebo enebo added this to the Non-Release milestone Dec 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants