-
-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Truffle] VM unexpectedly "cold" running the same methods (benchmarking) #3387
Comments
In general terms, the problem here is the same as before. You're running the interpreter in one state while warming up, and then in another state while measuring. The key method you are both warming up and measuring is the same, but you still warm up, do some other stuff like print to the screen, and then start measuring. That bit between does things that changes the state of the interpreter. Running with I didn't go far enough to figure out exactly what is causing Normally for key routines like that we always 'split', or generate a new copy of the method for each call site, but we don't do that for Rubinius code at the moment. You are seeing one downside of aggressive optimisation, in that when the program changes the way it runs everything can unravel for a while. The key problem is that you are causing everything to unravel right as you start measuring. How about run a single loop, measuring the time of iterations for say 2 minutes, so you get an array of samples. Then, remove the first 60s of samples. Then warmup and iteration is one unbroken action with no upheaval between. |
Thanks for the explanation @chrisseaton ! It seems like/I bet it is the printing of results - Various number comparisons are invoked there (just like over at benchmark-ips), rounded and then printed out. It should just be additions, divisions etc. so no idea why Thanks! Should I leave this open for further investigation on the Truffle said or close because I got the answer I needed and can now work around it? |
I can't see exactly what is calling I'll close this, just with the same advice that any code between warmup and measurement is bad. I think most experienced VM researchers you talk to would agree with that. I don't think we want to 'fix' anything in JRuby+Truffle here unfortunately. |
Just for info, |
Me again ;)
At rubykon (currently still on a branch) I started writing a very simple benchmarking tool to benchmark the final use case, which is expected to take multiple seconds to finish. Its code lives in
lib/benchmark/avg
. It runs the given block several times (until a time threshold is exceeded) to warm up and then runs it until a time threshold is reached again ("real" benchmark). I currently have debug output that prints the times of individual samples, which made me notice that after the warmup truffle is basically as slow as it was at the beginning of the warmup.This report is as of this commit
The behavior I'm seeing is the following:
If you look at the arrays printed there for debug, those are the sample times in seconds. It starts at 96 seconds in the warmup, then 16s. When it goes to the "real" benchmark it starts at 95s again, as if the VM were cold. Overall it even performs worse in the "real" benchmark than the warmup, in this particular instance.
The code calls the same methods and calls the same block, which is what baffles me. I.e. benchmarking always eventually ends up at:
The block called here runs a full Monte Carlo Tree Search with 1_000 playouts, so that is also already really repetitive.
Even the entry point for warmup and benchmark is the same though (type passed in is
:warmup
or:time
):The only difference is the banner printed to STDOUT and the total time it gets to perform computations (I also tried using the same time, same result).
I'm aware that this behavior might be triggered by the same thing that @chrisseaton patched in the benchmark-ips gem, but am unsure what to do about it as it really baffles me. My understanding is probably not good enough.
Things that seem to make it better is running the 9x9 benchmark along with it (and not just the 19x19 benchmark):
What also makes it better is just running the benchmarking/time phase twice:
What to do here? Not sure. A hint what else I could do to make it work with warmup would be nice, or well making the VM not get cold here, although that's probably way more complicated than I can imagine :)
Thanks for all the work!
Tobi
The text was updated successfully, but these errors were encountered: