-
-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip block-to-proc conversion for explicit block syntax when possible #2562
Conversation
15f11e0
to
839c35e
Compare
Thanks for the review. Yeah, I figured there were a few edge cases that I needed to handle better. The second case is tough, I didn't even consider it. I think allocating a new temp makes perfect sense. It seems like we'd want to have this be the very first compiler pass that gets run then, right? That way the new temp var can get optimized away when possible? Also what's the best place for me to add some explicit test cases for these edge cases? |
Replacing the reify with a %new_tmp = copy(%v1) would definitely solve the second issue. I agree that simplest for first problem is to not perform the optimization on scopes with BINDING_HAS_ESCAPED and if it has no nested scopes. Nested scopes are not insurmountable but it becomes more sophisticated to cope with. |
Yeah, I definitely think we just skip it with nested scopes. The kind of code that will benefit most from this is much more likely to fall into the simplest case |
I'll update once I get into the office |
Also it would be cool to see a small bench showing the benefit too. Numbers are fun :) |
Sure. I ran them showing 20% improvement for the simple case last night, I'll post the full benchmark though |
I've updated to account for the two issues raised. I still need to refactor the code a bit. Right now the code is assuming that the only place the result of |
The instructions generated for:
is now
|
@sgrif I was originally hoping to not have this be a pass because it will slow startup time down to run this on everything but @subbuss mentioned a simple mitigating solution. We can create an new IRFlag to mark scopes which contain block pass args (&foo) and only run this optimization pass for those scopes. That will eliminate this running in 95%+ of the scopes encountered. |
@enebo Absolutely. I am by no means coupled to keeping this as a separate pass. It just made getting the initial implementation done easier (given my unfamiliarity with the code base). I think the flag is a great solution though, I'll update to implement it. |
Here's the benchmarks you asked for. Script:
Before:
After:
|
Cool. I remember this started by your observation that a literal block was faster than a block pass var passed through. Are we the same speed for both now (I know you mentioned MRI but I am just curious)? 20% is nothing to sneeze at. |
I'll compare the two. I was actually hoping to show some real world comparisons on the code in AR that was extremely affected by this (on MRI), but last I checked the JDBC adapter still doesn't support 4.2. I've updated to skip this pass if the scope doesn't receive closure args. Turns out that flag was already there for us to use. Let me know if there's any other changes I need to make, if not I'll squash this down. |
Here's the comparison between the two forms. (script) Results on my branch:
I checked on master too, and it actually looks like the For reference here's the other results: jruby master:
MRI 2.2:
|
So only one more change and this looks great. Can you not add it as a pass if that flag is set instead of running the pass and having it decide to not operate on it? Startup time is a biggest risk in 9k right now and I noticed this small amount of setup logic to starting a pass is measurable :| |
@enebo Just confirming that sgrif@54757b7 is what you meant by that? |
54757b7
to
9348c0c
Compare
Or sgrif@9348c0c rather |
@sgrif yeah. That's what I wanted. |
Cool. I'll squash into a single commit. Were there any explicit tests I should add for this? |
If an explicit block is captured as a proc, but its only usage is converting it back to a block and passing it through, we can skip the conversion entirely, and just pass through the original block. This pass looks for all explicit block conversions, and checks if they're ever used in any context besides being passed as a block. If not, we rewrite all uses of the proc to pass the block through instead. The dead code elimination will remove the proc as an unused variable. It may be possible to implement this without adding another compiler pass, but I had trouble finding a place where it fit nicely, especially since it needs to run before the LiveVariableAnalysis. Given the following Ruby code: ```ruby def foo(&block) bar(&block) end ``` The instruction sequence before was: ``` %self = recv_self check_arity req: 0, opt: 0, *r: -1, kw: false, **r: -1 %v_1 = load_implicit_closure %t_block_2 = reify_closure(%v_1) line_num(1) %v_1 = call(FUNCTIONAL, bar, %self, [], &%t_block_2) return(%v_1) ``` With this patch, the instruction sequence changes to: ``` %self = recv_self check_arity req: 0, opt: 0, *r: -1, kw: false, **r: -1 %v_2 = load_implicit_closure line_num(1) %v_1 = call(FUNCTIONAL, bar, %self, [], &%v_2) return(%v_1) ``` Which gives a healthy boost to performance on any code which passes blocks through explicitly.
9348c0c
to
ea19e3c
Compare
No tests for this. Behaviorally I am waiting for CI to clear this though. |
@sgrif I wonder if there is still a problem: def(&b)
b = Proc.new
foo(&b)
end |
I'm not sure. I'm definitely replacing the local with a temp, but I'm not sure that's going to hurt anything, since the local will always get eliminated anyway (except for in cases where this pass wouldn't run). Are you specifically referring to the behavior of
|
@sgrif I should not have used Proc.new. I just mean if you end up reassigning variable block without some re-assignment detection you may pass wrong version of block? |
As far as I can tell, it's fine. The only side effect is that the local variable will get replaced with a temp variable instead, which I don't think is a problem. It still assigns everything to the right variable no problem. def foo(&block)
block = method_returning_proc(&block)
bar(&block)
end
on master:
So it looks like the only time the results would be different would be some case where the "Optimize Dynamic Scopes" pass doesn't replace the local variable with a temporary one. But I don't think there are any cases where that would happen and we run this pass. |
So in looking I cannot find a scenario which breaks. My notes ( @subbuss will also be looking tonight):
def foo(&block)
block = something if something_else
bar &block
end will not apply this because block is used as an ordinary variable. Good.
So I am back to thinking this is working :) |
Yeah, sounds like we're on the same page on that, then. I'm certainly happy to just skip the optimization if the variable is assigned if you guys would be more comfortable with that, but it seems to work regardless. |
@sgrif I am about done for today but with your patch applied I cannot run gem list. I reduced it to this case: def bar
yield
end
def foo(&block)
bar &block if block_given?
end
foo So there is still a somewhat simple error to correct. We will get there :) |
Cool, I'm heading out too. I'll dig into it tomorrow. On Thu, Feb 5, 2015, 4:19 PM Thomas E Enebo notifications@github.com
|
Try this:
You get:
So, you've effectively lost the "block = nil" initialization. Arbitrary variable renaming is tricky if the scope is not in SSA form or if you are not using use-def information. You have to make sure intermediate instructions don't clobber the variable that is being renamed downstream. I think the safe solution is to use what Tom proposed earlier which also significantly simplifies this code, and that is to replace the "v = reify_closure(block)" to "v = copy(block)" which just passes through the block unmodified (which is the intention of your optimization) without having to deal with any of the use-def hazards. And, later passes could optimize the copy and even if gets to bytecode, the JVM will trivially take care of the copy. I should have thought of this yesterday, but better late than never :-) .. Take a look at passes/LocalOptimizationPass.java and how it has to deal with these same use-def issues, and that pass only handles things within a basic-block. That pass effectively replaces a simplifiable instruction with a copy (which is what you are doing as well). Since your pass needs global information (i.e. no instr. within the scope uses the block in any other way), we cannot implement the simplification code in LocalOptimizationPass. However, since 'scope' is passed into simplifyAndGetResult method of Instr.java, ReifyInstruction.java could implement the global analysis as part of its implementation of simplifyAndGetResult and everything should work fine ... However, that feels a bit hacky at this time (for some reason that I cannot yet quite articulate) .. so, let us leave this as a separate pass right now and we can revisit how to merge it in later. We can chat more on IRC tomorrow if required. |
Thanks, subbuss. I'll dig into this further tomorrow, and ping you on IRC On Thu, Feb 5, 2015, 10:40 PM subbuss notifications@github.com wrote:
|
If an explicit block is captured as a proc, but its only usage is converting it back to a block and passing it through, we can skip the conversion entirely, and just pass through the original block. This pass looks for all explicit block conversions, and checks if they're ever used in any context besides being passed as a block. If not, we rewrite the `ReifyProcInstr` to just copy the block, instead. Given the following Ruby code: ```ruby def foo(&block) bar(&block) end ``` The instruction sequence before was: ``` %self = recv_self check_arity req: 0, opt: 0, *r: -1, kw: false, **r: -1 %v_1 = load_implicit_closure %t_block_2 = reify_closure(%v_1) line_num(1) %v_1 = call(FUNCTIONAL, bar, %self, [], &%t_block_2) return(%v_1) ``` With this patch, the instruction sequence changes to: ``` %self = recv_self check_arity req: 0, opt: 0, *r: -1, kw: false, **r: -1 %v_1 = load_implicit_closure %t_block_2 = copy(%v_1) line_num(1) %v_1 = call(FUNCTIONAL, bar, %self, [], &%t_block_2) return(%v_1) ``` Which gives a healthy boost to performance on any code which passes blocks through explicitly.
I've updated to use a copy instruction, rather than renaming the variable |
|
The problem is that &block need not receive a block so it instead gets assigned to nil if no block is given. |
I'm having trouble isolating when that's an issue, though. This code works fine.
|
With your changes that now runs properly. I thought it minimized the issue in optparse but apparently it is more complex. The error now complains about Block and IRubyObject and before it was casting between Block and RubyNil. |
Yeah it looks like there is some form of a conditional that it isn't detecting the block being used. I'm digging further now. |
It definitely appears to be |
It is happening here 'executing b_false(LBL_6:33, %t_block_8)' #
# See OptionParser.accept.
#
def accept(t, pat = /.*/m, &block)
if pat
pat.respond_to?(:match) or
raise TypeError, "has no `match'", ParseError.filter_backtrace(caller(2))
else
pat = t if t.respond_to?(:match)
end
unless block
block = pat.method(:convert).to_proc if pat.respond_to?(:convert)
end
@atype[t] = [pat, block]
end with IR:
|
Jinx, you owe me a coke. |
Ok, I think I see the problem. I'm doing the tracking & replacing per basic block, but I need to do it for the whole scope. |
We should be good now |
Great. We will wait for ci seal of approval now |
Skip block-to-proc conversion for explicit block syntax when possible.
@sgrif I will make some style tweaks later today for code style and I will refer to the commit here. I did not want to pester you with those and would rather just show them. |
And thanks again for digging into this. |
❤️ 💛 💚 💙 💜 ❤️ 💛 💚 💙 💜 ❤️ 💛 💚 💙 💜 ❤️ 💛 💚 💙 💜 ❤️ 💛 💚 💙 💜 ❤️ 💛 💚 💙 💜 |
If an explicit block is captured as a proc, but its only usage is
converting it back to a block and passing it through, we can skip the
conversion entirely, and just pass through the original block.
This pass looks for all explicit block conversions, and checks if
they're ever used in any context besides being passed as a block. If
not, we rewrite all uses of the proc to pass the block through instead.
The dead code elimination will remove the proc as an unused variable.
It may be possible to implement this without adding another compiler
pass, but I had trouble finding a place where it fit nicely, especially
since it needs to run before the LiveVariableAnalysis.
I believe the same optimization could occur to skip array allocation for splatted args, and hash allocation for double splatted args, in this same compiler pass.
Given the following Ruby code:
The instruction sequence before was:
With this patch, the instruction sequence changes to:
Which gives a healthy boost to performance on any code which passes
blocks through explicitly.