Performance issue with ruby meta programming #4879

GaneshSPatil · 2017-12-01T07:09:16Z

Environment

JRuby version (jruby -v): jruby 1.7.26 (1.9.3p551) 2016-08-26 69763b8 on Java HotSpot(TM) 64-Bit Server VM 1.8.0_101-b13 +jit [darwin-x86_64]
Operating system and platform (e.g. uname -a): Darwin **.local 17.2.0 Darwin Kernel Version 17.2.0: Fri Sep 29 18:27:05 PDT 2017; root:xnu-4570.20.62~3/RELEASE_X86_64 x86_64

Description

Extensively Using ruby extend functionality blocks threads which in turns causes server performance issues.
extend method calls jruby addSubclass function which is a bottleneck.

Background

Many libraries that provide JSON serialization make use of extend. Under heavy load, this causes performance degradation.

Code Snippet to Reproduce Issue:

module Bar
  def hello
    p "hello from bar!"
  end
end

class Foo
  def hello
    p "hello!"
  end
end

threads = 1000.times.collect do
  Thread.new do
    10000.times do |i|
      foo = Foo.new
      foo.extend(Bar)
    end
  end
end

threads.each {|t| t.join }

Run the above code snippet and capture thread dump. You will see a lot of threads waiting to enter synchronized this block.

here is an example thread dump.

The text was updated successfully, but these errors were encountered:

ketan · 2017-12-01T07:16:04Z

For perspective, this is locking up a significant number of jetty web threads because of object.extend being invoked by a ruby gem.

kares · 2017-12-01T08:50:50Z

... is this yet another report against (the no longer supported) JRuby 1.7.x or did you guys repro on 9K ?

kares · 2017-12-01T08:55:05Z

got it, that addSubclass piece should not have changed since 1.7 so it should be an issue on 9K as well
... should be fine to synchronize over the RubyClass instance only but there's must be a reason the global lock is (also) there

ketan · 2017-12-01T08:55:26Z

We've run this on jruby 9000 as well, same result.

headius · 2017-12-01T08:58:08Z

We can probably improve the synchronization here, but extend has a pretty big cost even on standard Ruby. Doing it for every json serialization will severely limit throughput.

There may be ways to improve this and do a more narrow synchronization, but will be tricky to reduce the cost of extend.

ketan · 2017-12-01T09:07:30Z

We can probably improve the synchronization here, but extend has a pretty big cost even on standard Ruby. Doing it for every json serialization will severely limit throughput.

There may be ways to improve this and do a more narrow synchronization, but will be tricky to reduce the cost of extend.

Could you elaborate this, or point me to something I can read to understand this better?

kares · 2017-12-01T12:35:00Z

could you guys link us to JSON doing extend ... thought the DCI (anti) pattern to be dead at this point.
... so I am curious as to why a 'generic' library such as JSON would do that, on every parse?

enebo · 2017-12-01T16:21:34Z

@ketan you are changing the definition of what foo calls every time you call extend and each time it is a new foo. It does happen to be changing each new instance to the same type every time but this is very hard to see that internally (and we don't and MRI does not either). For that reason we heavily discourage this pattern. DCI also used this pattern and here is an article which delves into it in more detail:

https://tonyarcieri.com/dci-in-ruby-is-completely-broken

ketan · 2017-12-01T17:04:22Z

Thank you for the explanation. Given that this is a known issue (with all ruby implementations?), do you suggest closing this issue out?

enebo · 2017-12-01T17:06:59Z

@ketan I am not sure. @headius wonders if we can improve the lock to maybe make extend not stack up as much. If that can be improved we will improve in this situation a tiny bit.

We are interested whether a major library is doing this DCI pattern or not. It would be nice to help them speed up if we can. Was this in a library/gem?

headius · 2017-12-02T05:23:29Z

Could you tell us what gem is doing this? Ideally, we'd be able to help them find an alternative.

The first time you extend an object we have to construct a new hidden class between the object and its "real" class. That means dozens of objects to support the additional class, plus difficulties caching methods because each object looks like a new type.

The cost of extend should also be an issue on MRI. You'll get better performance on both impls if you avoid heavy use of extend.

Are we significantly slower than MRI here?

ketan · 2017-12-03T08:16:29Z

The gem we're using is representable. Here's (https://github.com/trailblazer/representable/blob/3b66945c7de584a0c70dc1316101903f7ad1ebb7/lib/representable.rb#L116-L120) the line which is responsible for the object.extend. We've also tested with a couple of other gems to an identical end-result — threads being BLOCKED because of RubyClass#addSubclass. (I suppose there's something to be said about paying a performance cost for developer productivity :)

Our app does not run on MRI, I'm in no position to compare performance with a real app. However a quick benchmark with the example below seems to indicate that JRuby is atleast 2 times faster than MRI is.

require 'rubygems'
require 'benchmark/ips'

class ExampleClass
  def foo; 42; end
end

module ExampleMixin
  def foo; 43; end
end

puts "Running with #{RUBY_DESCRIPTION}"

Benchmark.ips(20) do |bm|
  bm.report("without dci") { ExampleClass.new.foo }
  bm.report("with dci") do
    obj = ExampleClass.new
    obj.extend(ExampleMixin)
    obj.foo
  end
end

And the results:

Running with jruby 9.1.14.0 (2.3.3) 2017-11-08 2176f24 Java HotSpot(TM) 64-Bit Server VM 25.121-b13 on 1.8.0_121-b13 +jit [darwin-x86_64]
Warming up --------------------------------------
         without dci   257.027k i/100ms
            with dci    36.995k i/100ms
Calculating -------------------------------------
         without dci     15.826M (± 8.6%) i/s -    311.003M in  19.985248s
            with dci      1.152M (±35.9%) i/s -      6.622M in  20.439169s

Running with ruby 2.3.4p301 (2017-03-30 revision 58214) [x86_64-darwin15]
Warming up --------------------------------------
         without dci   226.292k i/100ms
            with dci    36.996k i/100ms
Calculating -------------------------------------
         without dci      6.081M (± 7.9%) i/s -    120.614M in  20.072074s
            with dci    371.584k (±21.3%) i/s -      6.955M in  20.062720s

enebo · 2017-12-03T17:37:56Z

@ketan It seems unlikely we will convince these authors to not use the DCI pattern as one or more have done talks on it and I believe think it is worth the cost (e.g. they know the costs); but from your particular perspective a snippet like:

SongRepresenter.new(song).to_json

should not need to be implemented with DCI at all. I think perhaps the main benefit would be the simple ability to access internal state without access to properties. OTOH, what JSON do you plan on dumping where that state is not already public? Obviously, I am not asking you to write another implementation but the basic premise of decoupling this logic into separate types might not be too hard if you decide that the cost is too high.

headius · 2017-12-05T16:47:27Z

There are ways to optimize this but it's something we're reluctant to do when this pattern is so slow in MRI. Even if we optimize it, most users will not see this pattern run fast unless ruby-core also decides to optimize it.

Ideas for optimization:

Given a sequence of extends, we can reuse the same set of include wrappers for all cases. This blunts the base cost of extend (which, for the first call, creates both a new singleton class and a new included module wrapper).
We can cache those wrappers either in the main class itself or at the extend call site. Basically, if an extend call site is calling Object#extend and the class's current type and the extended module are the same, we use the cached pseudo-type.
The tricky bit is if the object's singleton class needs one-off modifications, like defining single methods. Caching these types would require logic at def sites to know that the old "shape" of the singleton and the new "shape" after adding a method are cached and reusable.

I'm inclined to close this bug as Won't Fix since we're performing at least as well as MRI on a very poor-performing pattern. If MRI sees fit to optimize this pattern for core Ruby, we'd be happy to work with them.

ketan · 2017-12-05T16:57:00Z

@headius and @enebo — thank you for the consideration.

I agree that the DCI pattern seems like an overkill for this case. We're considering options to move to other JSON libraries in ruby — or, failing that — use something in Java :-/

headius · 2017-12-05T17:57:24Z

@ketan I would present your findings to the authors of that library, and if they really want to use this pattern perhaps encourage them to take it up with MRI. Many folks using the DCI pattern don't realize what a huge impact it has on performance, nor that there are reasonable alternatives.

Closing this as Won't Fix for now.

kares · 2017-12-05T20:59:41Z

@headius @enebo wondering - is the double sync really needed, locking the RubyClass isn't enough?

headius · 2017-12-06T22:31:41Z

@kares This may be something we can improve. I traced those lines to 8549533, which was over 8 years ago and may no longer hold. I'll investigate a bit more.

headius · 2017-12-06T22:58:04Z

I'm reopening this because it does appear we can eliminate some (maybe a lot) of the locking here.

@kares I believe you are right. The methods that only modify the local subclasses need only synchronize on that collection. Methods that update the subclasses field must synchronize on the RubyClass. And then there's methods that actually recurse into subclasses...

I will do a PR (for 9.1.16) that removes most of this synchronization, makes the rest narrower, and ideally improves the overhead of subclass modifications and method invalidation.

headius · 2017-12-06T23:00:42Z

Note: this is going into 9.1.16 because it's high risk to be fiddling about with these locks right before a release. We'll let it bake over the holiday and do a release after the new year.

@GaneshSPatil Looks like you may get a partial fix after all! It won't improve the straight-line speed of extend, but it should reduce the contention you reported. Please stand by to test your app against the PR.

headius · 2017-12-07T01:19:43Z

@GaneshSPatil You could start testing that PR any time. Better sooner than later, since I know we'll uncover some concurrency issues in the process of reworking this locking.

ketan · 2017-12-07T03:35:24Z

@headius — thanks for the PR. We'll give this a spin today and get back to you.

TheKidCoder · 2018-01-12T18:29:38Z

@ketan Any update on your findings from the PR?

GaneshSPatil · 2018-01-23T03:47:13Z

@TheKidCoder -- We've tested the PR on the performance test setup, on which we found the performance issue.

Our findings:

We didn't see any blocked threads around addSubClass method with the PR changes. The original issue regarding the server performance issues with ruby extend functionality addressed by the PR.
Upgrading the jruby version from 1.7.26 to 9.1.15 we've faced performance issues. One of our API endpoints which used to take 12s to 15s to respond now takes around 45s to a minute. The timings are captured on an idle server. On a heavily loaded server, we expect the APIs to take more time to respond.

kares · 2018-01-23T07:34:15Z

sounds like the original issue is resolved (this can be closed), @GaneshSPatil for the other 'performance' issue I would fill a new bug when you know the details. if you need help figuring it out I am happy to consult.

enebo · 2018-01-23T16:19:56Z

@GaneshSPatil we really want to know why there is such a large difference in time. That really sounds like something which should JIT just isn't for some reason (large methods being one common reason -- 9k is a completely different set of internals and some methods could be small enough in 1.7 to JIT but end up being larger in 9k -- this is just an example). Please open a new issue with any information.

headius · 2018-01-24T17:16:23Z

Fixed in 9.1.16 and confirmed by reporter.

headius closed this as completed Dec 5, 2017

headius added this to the Won't Fix milestone Dec 5, 2017

headius reopened this Dec 6, 2017

headius mentioned this issue Dec 7, 2017

Improvements to subclass management and method cache invalidation #4886

Merged

headius modified the milestones: Won't Fix, JRuby 9.1.16.0 Dec 7, 2017

headius closed this as completed Jan 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Sponsors

Performance issue with ruby meta programming #4879

Performance issue with ruby meta programming #4879

GaneshSPatil commented Dec 1, 2017

ketan commented Dec 1, 2017

kares commented Dec 1, 2017

kares commented Dec 1, 2017

ketan commented Dec 1, 2017

headius commented Dec 1, 2017

ketan commented Dec 1, 2017

kares commented Dec 1, 2017

enebo commented Dec 1, 2017

ketan commented Dec 1, 2017

enebo commented Dec 1, 2017

headius commented Dec 2, 2017

ketan commented Dec 3, 2017

enebo commented Dec 3, 2017

headius commented Dec 5, 2017

ketan commented Dec 5, 2017

headius commented Dec 5, 2017

kares commented Dec 5, 2017

headius commented Dec 6, 2017

headius commented Dec 6, 2017

headius commented Dec 6, 2017

headius commented Dec 7, 2017

ketan commented Dec 7, 2017

TheKidCoder commented Jan 12, 2018

GaneshSPatil commented Jan 23, 2018

kares commented Jan 23, 2018

enebo commented Jan 23, 2018

headius commented Jan 24, 2018 •

edited

Loading

Performance issue with ruby meta programming #4879

Performance issue with ruby meta programming #4879

Comments

GaneshSPatil commented Dec 1, 2017

Environment

Description

Background

Code Snippet to Reproduce Issue:

ketan commented Dec 1, 2017

kares commented Dec 1, 2017

kares commented Dec 1, 2017

ketan commented Dec 1, 2017

headius commented Dec 1, 2017

ketan commented Dec 1, 2017

kares commented Dec 1, 2017

enebo commented Dec 1, 2017

ketan commented Dec 1, 2017

enebo commented Dec 1, 2017

headius commented Dec 2, 2017

ketan commented Dec 3, 2017

enebo commented Dec 3, 2017

headius commented Dec 5, 2017

ketan commented Dec 5, 2017

headius commented Dec 5, 2017

kares commented Dec 5, 2017

headius commented Dec 6, 2017

headius commented Dec 6, 2017

headius commented Dec 6, 2017

headius commented Dec 7, 2017

ketan commented Dec 7, 2017

TheKidCoder commented Jan 12, 2018

GaneshSPatil commented Jan 23, 2018

kares commented Jan 23, 2018

enebo commented Jan 23, 2018

headius commented Jan 24, 2018 • edited Loading

headius commented Jan 24, 2018 •

edited

Loading