Various literal to_proc'ed Symbol optimizations. #3571

headius · 2015-12-28T22:54:01Z

Does this seem like a good optimization? Are there any gotchas I've missed?

Use a dummy binding rather than creating new every time.
Cache the resulting proc at create site.

The second optimization results in a given &:foo in code only
creating a single Proc, ever, and caching it at that point in the
code. This is based on the observation that symbol procs typically
are used to iterate over homogeneous collections of objects, so
caching the proc allows its cache to stay populated and local to
the related code. This also eliminates the allocation of a Block,
BlockBody, and RubyProc for each encounter, which improves perf
also for heterogeneous collections with poor cacheability.

Benchmark:

loop {
  puts Benchmark.measure {
    ary = [1,2,3,4]
    1_000_000.times {
      ary.each(&:object_id)
    }
  }
}

Before:

  1.270000   0.070000   1.340000 (  0.710043)
  0.640000   0.020000   0.660000 (  0.511692)
  0.470000   0.000000   0.470000 (  0.460667)
  0.490000   0.010000   0.500000 (  0.480732)
  0.470000   0.000000   0.470000 (  0.462888)

Just the dummy binding optimization:

  1.210000   0.070000   1.280000 (  0.660924)
  0.540000   0.020000   0.560000 (  0.432614)
  0.430000   0.000000   0.430000 (  0.422502)
  0.430000   0.000000   0.430000 (  0.416549)
  0.410000   0.010000   0.420000 (  0.412461)

And with proc caching:

  0.890000   0.060000   0.950000 (  0.456065)
  0.410000   0.020000   0.430000 (  0.279023)
  0.290000   0.000000   0.290000 (  0.282117)
  0.300000   0.010000   0.310000 (  0.288516)
  0.270000   0.000000   0.270000 (  0.270100)

* Use a dummy binding rather than creating new every time. * Cache the resulting proc at create site. The second optimization results in a given &:foo in code only creating a single Proc, ever, and caching it at that point in the code. This is based on the observation that symbol procs typically are used to iterate over homogeneous collections of objects, so caching the proc allows its cache to stay populated and local to the related code. This also eliminates the allocation of a Block, BlockBody, and RubyProc for each encounter, which improves perf also for heterogeneous collections with poor cacheability. Benchmark: ```ruby loop { puts Benchmark.measure { ary = [1,2,3,4] 1_000_000.times { ary.each(&:object_id) } } } ``` Before: ``` 1.270000 0.070000 1.340000 ( 0.710043) 0.640000 0.020000 0.660000 ( 0.511692) 0.470000 0.000000 0.470000 ( 0.460667) 0.490000 0.010000 0.500000 ( 0.480732) 0.470000 0.000000 0.470000 ( 0.462888) ``` Just the dummy binding optimization: ``` 1.210000 0.070000 1.280000 ( 0.660924) 0.540000 0.020000 0.560000 ( 0.432614) 0.430000 0.000000 0.430000 ( 0.422502) 0.430000 0.000000 0.430000 ( 0.416549) 0.410000 0.010000 0.420000 ( 0.412461) ``` And with proc caching: ``` 0.890000 0.060000 0.950000 ( 0.456065) 0.410000 0.020000 0.430000 ( 0.279023) 0.290000 0.000000 0.290000 ( 0.282117) 0.300000 0.010000 0.310000 ( 0.288516) 0.270000 0.000000 0.270000 ( 0.270100) ```

headius · 2015-12-28T22:54:54Z

Note that the dummy binding improves the overall cost of calling Symbol#to_proc as well, even if the proc is not cached.

enebo · 2015-12-29T01:12:19Z

My only reservation would be any complexity it would add to Block/BlockBody, but it appears to add none (one custom type replaced by another). At worst, there is something weird with exotic parameter binding?

I am wondering if we can make a to_proc in Ruby (or IR assembly) and then potentially inline it? I guess since inlining is still in its infant stages this might not be worth discussing now...

headius · 2015-12-29T01:21:26Z

The new inner class just pulled out an existing anon inner class, so there's no new classes. That part of the change was mostly lateral, but I hate large anon inner classes.

My main concerns were around the dummy binding, but that seems fine in testing since the block body is not in Ruby. The proc should perhaps be frozen...but can you do anything to a proc?

Inlining...yes, that is definitely a possibility. It may be easier without the moving target of a new proc every time anyway, though. If we can inline to_proc, we could see it is a send of a constant name, which is just a call site. We'd need to inline through #each as well but in theory it could go all the way through to the method that the send actually calls.

In any case, knowing it is a constant block rather than a new one should help that.

Various literal to_proc'ed Symbol optimizations.

headius added a commit that referenced this pull request Dec 30, 2015

Merge pull request #3571 from jruby/symbol_to_proc_cache

cca4a33

Various literal to_proc'ed Symbol optimizations.

headius merged commit cca4a33 into master Dec 30, 2015

headius deleted the symbol_to_proc_cache branch December 30, 2015 20:42

headius restored the symbol_to_proc_cache branch January 19, 2016 16:02

headius deleted the symbol_to_proc_cache branch January 19, 2016 16:04

enebo modified the milestone: Non-Release May 25, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Sponsors

Various literal to_proc'ed Symbol optimizations. #3571

Various literal to_proc'ed Symbol optimizations. #3571

headius commented Dec 28, 2015

headius commented Dec 28, 2015

enebo commented Dec 29, 2015

headius commented Dec 29, 2015

Various literal to_proc'ed Symbol optimizations. #3571

Various literal to_proc'ed Symbol optimizations. #3571

Conversation

headius commented Dec 28, 2015

headius commented Dec 28, 2015

enebo commented Dec 29, 2015

headius commented Dec 29, 2015