Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jruby9k not allowing set_trace_func without --debug, diverges from mri behavior #3096

Closed
jasonayre opened this issue Jul 1, 2015 · 24 comments
Milestone

Comments

@jasonayre
Copy link

After loading up my application using 9k, it prints following warning all over the place (and which breaks my application code, if I dont pass debug):

tracing (e.g. set_trace_func) will not capture all events without --debug flag

This behavior is divergent from MRI's behavior, which as I understand you try to mirror as close as possible, so I'm wondering whether this diversion is intentional or not. If it is, I'd like to appeal that it be removed.

Truthfully, the real reason I'd like for the restriction to be removed has nothing to do with above but rather, I have numerous libraries where I've been (ab?)using TracePoint (which uses set_trace_func), with regardless of whether you prefix the word or not, has been to much success.

The best use case (and an example of something that cant be done in ruby without set_trace_func), can be seen here (an abstract methods implementation). Because when the module is included I cant check which methods have been defined on the class, as module included is called before the method definitions, I can get around this by watching for the end of the class definition using trace point. I can also use a simple class instance variable setter pattern to store the method names, since all of that can happen, before calling the after_included hook allowed by tracepoint.

https://github.com/jasonayre/trax_core/blob/master/lib/trax/core/abstract_methods.rb

Because set_trace_func fires off a class end event for the current class being watched, I am able to check for the the end of the class definition, and determine whether a method was implemented, and throw an error if not.

After using TracePoint class to solve the abstract method problem, it opened my eyes to using it in several other areas, to solve various problems which are much easier to solve with deferred class evaluation (i.e. a mixin pattern which allows you to pass arguments into the module, or set them using class instance variables, which once again allows you to run the configuration behavior of the module being included lazily.)

It may strike you as janky using a library intended for debugging to do these kind of things, but I've compared for example the mixin implementations side by side, and the TracePoint version makes things much simpler, as well as opens up quite a few possibilities.

So anyways, not sure if this was even intentional since this diverges from MRI, but in the event it's not, I figured I'd plead my case for the reversion of said behavior :)

@enebo
Copy link
Member

enebo commented Jul 1, 2015

Historically, we did not support set_trace_func and eventing which it depends on because it slows down everything even when it is not being used by a measurable amount. If you use this behavior you have to supply --debug. This is a difference with MRI, but we made a decision to give a majority better performance at the cost of users using this flag when they want this behavior.

With that said...

In JRuby 9000 we instrument the events as instructions now. At the moment, we do not support this but we could re-instrument event instructions on demand so that --debug is not required. This is not a trivial amount of code to support but it is possible so there is hope.

Are you saynig your are seeing this many times in your app? Or perhaps once per restart?

@jasonayre
Copy link
Author

Ah, thanks for the info. Is the "eventing" referenced above at the java level? Also, does the performance hit from eventing take into consideration uses such as mine, where using TracePoint, Im watching for "class end", explicitly, and cancelling the watch after it hits?

Well, I'm seeing it multiple times per restart, (but once it finishes loading, it stops, though things still fail to load properly since it was disabled) because I've been using it as an include in multiple places, even outside of the abstract methods use case listed above, such as in
https://github.com/jasonayre/trax_core/blob/master/lib/trax/core/inheritance_hooks.rb
, basically as a way to define a inherited/class finished evaluating hook, because I can then access all of the constants/properties of that class.

Perhaps a clearer example than the abstract methods one above, is lets say an Enum class that we want to be able to define the enum values as constants -- i.e. (and Im not saying this is the best way to do this, but for sake of illustration)

class Enum
  include ::Trax::Core::InheritanceHooks

  after_inherited do
    constants.each do |name|
      #whatever logic to make it act like an enum here, but point being
      #constants/methods/props of the subclass arent accessible using self.inherited
      #since inherited is called immediately, before the class has finished evaluating,
      #so this after inherited hook just passes scope to subclass,
      #watches for the end of the current class definition,
      #and after its reached, stops watching via trace.disable
    end
  end
end

class Category < Enum
  CLOTHING = 1
  SHOES = 2
end

So anyways, after loading my app in jruby for the first time and seeing things break, figured Id pump the breaks on using these type of hooks further as I obviously don't want my lib to be jruby incompatible. But if it could be supported without the --debug flag, that would be super awesome. The abstract methods use case however is one where I just don't see any other (not completely wonky) way of implementing it in ruby.

@enebo
Copy link
Member

enebo commented Jul 1, 2015

@jasonayre we will be reviewing the ability to add the events on-demand once we start work (again) on profiled optimizations. This will be after 9.0.0.0 but will happen during point releases. Once we have a system for deopt'ing back to a safe point we should be able to weave in the events and replace with a newer version. Then no more --debug needed and if you enable it soon enough then the deopt for your user code will not happen since it will be one when it originally compiles it.

To answer your question, the cost is reduced when you no longer have anything registered but the code since no handlers are registered but the code which checks to see if there are handlers is still constantly being asked. So it is a bump in perf.

@jasonayre
Copy link
Author

@enebo coolio sounds good! as always, thanks for all your hard work @enebo and jruby team!

@headius
Copy link
Member

headius commented Mar 9, 2019

I did an experiment along these lines! #5633

So the bottom line is that we can support a number of static metadata trace events (what file is running, what line is being executed, etc) pretty easily; when not used those events won't happen and will largely be no-ops. When activated, code that was previously not set up for debug events will adjust;.

However, most user-facing hook usage is through set_trace_func which provides the guarantee of a caller binding on every hook call. That's kind of a deal breaker for optimizing stuff, since any scope might suddenly be needed to support hook code. We could make the binding lazy, allowing the cases that don't use binding to work fine and throwing an error for the rare cases where binding is not available but someone wants it...but it's not a good general solution.

MRI supports these features all the time because it doesn't really do that much optimization in the first place. Other implementations support this by basically throwing a bunch of optimizations away if anyone triggers the related event, leaving you with an app that expects set_trace_func to be a regular thing.

I have not figured out whether the balance between always-on trace events and production performance can be mitigated just yet. For the moment we are not going to integrate that PR, but it has taught us various things.

@danmayer
Copy link

I have gotten a few requests about having Coverband support JRuby (see the issue Github linked above)... I finally tried to get tests and a Rails app working, and found this open issue...

I believe what this issue means is that since set_trace_func doesn't run without --debug basically Coverage.peek_result is different as well... It looks to be missing hits on code that is loaded...

quick reproducible details:
https://gist.github.com/danmayer/c9b1558c2ba535c80c0ed238332db82d

I will try to get everything working with --debug and possibly say that is a way forward... I don't know enough about all the performance tradeoffs. Note that Coverband is a tool intended to run in production.

@headius
Copy link
Member

headius commented Apr 15, 2020

First off, I would strongly recommend against using any of the trace APIs in production. They cause overhead not just on JRuby, but on every Ruby implementation. Tracking every line event all the time should have a pretty large impact on runtime performance.

Over the years we have added back some events that happen infrequently enough that they can be always-on. I'm not certain which events always fire right now, but for example class entry/exit events or raise events are infrequent as well as having enough overhead on their own that triggering the event will be of negligible impact.

We are interested in making more events be always-on, but there are other problems that result: specifically, set_trace_func does not provide a way to differentiate events; you receive ALL of them all the time, and every event brings along a Binding. Those bindings have to be constructed even if they're not used, and the execution state they wrap needs to be available. That cripples many optimizations we do.

I'm not sure exactly what features you need available in production, so we should talk about that.

@danmayer
Copy link

So check out https://github.com/danmayer/coverband/ to learn more about the project. The project doesn't directly use set_trace_func but is using Coverage and I see a difference in the results of Coverage.peek_result, I get the results I expect when I set debug.fullTrace=true, so I assume that is somehow related to set_trace_func.

The project Coverband, has been running for years, and runs in production on hundreds of projects. We use it at a very large consumer site with heavy traffic, on the primary monolith and a number of smaller microservices. We have benchmarked with it on and off and can't see meaningful performance differences even at the p95 levels. It is very well battle-tested in terms of MRI. In fact, the Cookpad team, while not using Coverband now is doing the same pattern and runs Coverage on production. Here are some of their details on using Coverage https://speakerdeck.com/riseshia/cleaning-up-a-huge-ruby-application?slide=81

I haven't had a good opportunity to use JRuby, if you think the overhead is much more significant for JRuby, I could say that JRuby isn't supported as a platform. Is it perhaps better to make a new issue and separate the isolated cases related to Coverage from more general set_trace_func?

@headius
Copy link
Member

headius commented Apr 15, 2020

Ok, Coverage is a bit different.

Internally, both JRuby and CRuby have an "event hook" API. This API backs up set_trace_func, Tracepoint, and Coverage in different ways.

Because Coverage runs entirely in "native" code (Java in our case) and it mostly listens for line events, it's the cheapest of the three. It can be largely supported without removing our optimizations, but it's still far from free; every line of code has the additional overhead of triggering an event hook, which looks up file and line and increments a counter.

However, I do see that both the presentation you linked and Coverband itself uses the oneshot_lines feature added in Ruby 2.6. It may be possible to support this with lower overhead, since we'd only need to fire the event once per line. I assume your measurements are based on using oneshot_lines, which would explain why you don't see much overhead.

@danmayer
Copy link

So Coverband supports both older Coverage and oneshot, oneshot is definitely faster, but Coverband helps with a lot of older legacy systems so we currently support Ruby 2.3 and up.

I have measured both and honestly on a lot of older large systems even the overhead of Coverage when limited to app code, since Coverage only puts the hooks on files that are required after it, hasn't been significant in my benchmarking. The app code layer is such a small amount of called lines in a typical Rails (or Rack) app that the time spent is dwarfed by other things. This is the typical load pattern Coverband sets up.

Bundler.require(*Rails.groups) if defined?(Bundler)
require 'coverage'
Coverage.start
# ... some other things
load all app code

All this being said, if JRuby only wanted to support oneshot, we could have Coverband only support JRuby via oneshot mode. I could also get everything supported, and say it requires debug.fullTrace=true and isn't recommended due to performance concerns. The project has been around a long time and I have in the lift of the project heard of 3 apps requesting JRuby support. I know what Coverband is doing isn't super typical, so I don't want to try to force the feature into JRuby for a weird use case... I just found this ticket while trying to research if I could easily make everything work for folks.

@headius
Copy link
Member

headius commented Apr 15, 2020

Ok, so I looked into the C code that CRuby uses to implement oneshot_lines and it's about like I expected: once the event fires for a given line, they stop calling the event hook for that line and append the line number to the array of executed lines for a given file.

In CRuby this is easy because they always interpret. This allows them to rewrite these instructions at any time and you don't see much performance impact, because it's all equally "slow" like an interpreter.

For the same reasons JRuby doesn't support always-on coverage, the new Ruby JIT "mjit" also does not support event hooks (currently). When you enable an event hook API like Tracepoint or Coverage, JIT is immediately disabled. If this doesn't change, Coverband and oneshot_coverage will either not work or will hinder application performance even on CRuby.

@danmayer
Copy link

For the same reasons JRuby doesn't support always-on coverage, the new Ruby JIT "mjit" also does not support event hooks (currently). When you enable an event hook API like Tracepoint or Coverage, JIT is immediately disabled

Interesting... I will have to keep an eye on that it would be sad to loose access to such interesting and cool data... but I understand why it is problematic. So far, I haven't heard from anyone trying to run apps with the mjit enabled, but I will look a bit more into it.

@headius
Copy link
Member

headius commented Apr 15, 2020

We could also do the same thing CRuby does and disable JIT for files that have coverage enabled... but I don't think that's a good path to take either.

@headius
Copy link
Member

headius commented Apr 15, 2020

FWIW I just measured JRuby running a benchmark with and without --debug but no trace installed, and the performance is about the same:

$ jruby ../bench2018/bench_mandelbrot.rb
0.133393
0.136166
0.133395
0.16176300000000002
0.138471
0.140103
0.140105
0.140048
0.12916
0.130469
[5, 5, 5, 5, 5, 5, 5, 5, 5, 5]
$ jruby --debug ../bench2018/bench_mandelbrot.rb
0.134773
0.13878900000000002
0.13229100000000002
0.145786
0.13273600000000002
0.136503
0.141527
0.131116
0.14265799999999998
0.131332
[5, 5, 5, 5, 5, 5, 5, 5, 5, 5]

With full Coverage enabled, this drops way down mostly due to the overhead of calling line events every time:

$ jruby --debug -rcoverage -e 'Coverage.start; load "../bench2018/bench_mandelbrot.rb"; puts Coverage.result'
7.616358
6.9107970000000005
6.891704
6.858162
6.861509
6.895537999999999
6.896193
7.049369
6.946976
6.870878
[5, 5, 5, 5, 5, 5, 5, 5, 5, 5]

It's probably possible to make oneshot_lines almost as cheap as running without --debug; each line would just go back to doing nothing once it has been encountered.

At this point the main reason we don't have these traces enabled for things like Coverage (which don't need a Binding) is to reduce the amount of extra code we would have to generate (which means longer startup and warmup time for apps, and possibly reduced native JIT optimization.)

@enebo Brainstorming a bit here...

I believe the big piece we're missing here is that we're still implementing Coverage using a global event hook. That seems like it's different in CRuby; they actually instrument the instruction sequences now and install hook events at each line. This allows them to turn those hooks off line-by-line as code is encountered.

Supporting this in JRuby would basically just be a matter of having our LineNumber operation include a flag saying if it had been encountered yet. Lines encountered during interpretation would never need additional tracing; the jitted version would simply not output that logic. Lines not encountered during interpretation would use a SwitchPoint to fire exactly once and mark the line as encountered, allowing the JVM JIT to later eliminate the trace event.

Since most code will execute in either our interpreter or the JVM's interpreter at least once before going to JIT, most JIT output would have no oneshot_lines coverage overhead at all.

@danmayer
Copy link

danmayer commented Apr 15, 2020

cool I will continue to follow this...

Also, I should note, yes Coverage on various types of apps can have significant overhead. It hasn't tended to show up in my benchmarks of web apps with limited usage... but it definitely adds overhead to every line run. I previously benchmarked Coverband against an ML lib, and it showed large differences, but since the target use case was web apps, the micro-benchmarks against rack/rails apps showed very little difference. While ApacheBenchmark tests against deployed apps showed, very slight difference...

Generated benchmarks always have some issues, so I have also relied on NewRelic and DataDog stats on a number of production systems to ensure low-performance overhead. I mostly look at impacts on the p95 scale.

If there is interest I have a series of posts on the various benchmarking efforts the project has gone through over the years.

@enebo
Copy link
Member

enebo commented Apr 15, 2020

@headius For line that seems like it would be cheap. For posterity, I am including my original thoughts on boiling the tracing ocean.

Startup interp gets all trace instrs and they more or less noop if tracing is off in interp (which will have a noticeable but hopefully small cost since we are emulating a machine). JIT will not emit any trace instrs initially. At entry to a JIT compiled unit OR when control returns (e.g. return from a call instr) we check to see if it is enabled (checkpoint) and then raise an exception. That takes us back to startup interp and we either immediately reJIT or just let it warmup again. I believe with switchpoint we could then probably have a very low overhead cost for not requiring --debug. I do naively think it will still be some cost. The obvious complexity is the same one we have with generic deoptimization...we need to continue from a particular ipc. If deoptimization existed already this would be pretty easy but there is some infrastructure needed to make this work.

This strategy could be applied to the startup interp as well but that would mean never eliminating the AST in any case where it coudl fall back. Memory tradeoff concern there.

@headius
Copy link
Member

headius commented Apr 17, 2020

@danmayer FWIW I just landed #6180 on master, which does two things:

  • Coverage is supported in all modes, and does not require --debug anymore.
  • oneshot_lines is supported and has nearly zero overhead beyond the initial flagging of each line.

The implementation largely follows what I suggested in #3096 (comment), which means the eventual JIT code will either have no coverage overhead at all, or it will use invokedynamic to limit that coverage to the first time executing a line. The native code that results should never have coverage overhead since it will always have run at least once before the native JIT gets to it.

There are some caveats though:

  • The line numbers are off in a few cases, like multi-line hash/array literals.
  • The methods and branches coverage modes are still not supported in any mode.

@danmayer
Copy link

wow, awesome to see/hear @headius yeah Ruby coverage has some issue with line number on multi-line statements.

No worries on methods and branches I don't use those features at all. I believe SimpleCov now supports branches, but it isn't required.

@headius
Copy link
Member

headius commented Apr 17, 2020

I'm going to close this issue.

The Coverage discussion was only tangentially related because it was based on event hooks before. Now it updates coverage data directly from the LineNumber instructions, and it can be used at any time without enabling debug mode.

So that leaves set_trace_func and TracePoint requiring debug mode.

We can make the event itself be basically free and always-on using the same trick as for oneshot_lines coverage. The down side of doing so is that we would have to emit the fallback logic everywhere that a trace might be needed in the future, which essentially means an additional aload and invokedynamic per line at a minimum. Even if they optimize away in the JIT, it still adds a lot of code and probably hurts warmup (and possibly startup time).

In addition, even though we could make the events be always-on, we can't do the same for the binding.

set_trace_func always provides a binding to the block, so we'd need to always allocate it. As a result, only the full debug mode can support set_trace_func (at great impact to performance).

For TracePoint, we could support partial "always-on" since you have to request the binding (via TracePoint#binding). We could either error or warn at that point, and non-binding traces would work fine in all execution modes.

Since it seems like most of the interest is in always-on Coverage rather than tracing, and we've implemented that in #6180, we'll call the rest of this "won't fix" until we can be convinced that the extra code size and warmup hit are worth always-on tracing.

@headius headius closed this as completed Apr 17, 2020
@headius headius added this to the Won't Fix milestone Apr 17, 2020
@headius
Copy link
Member

headius commented Apr 17, 2020

@danmayer We would love to have you try out the improved Coverage support on JRuby master! There are snapshot builds for every green CI run here:

https://oss.sonatype.org/content/repositories/snapshots/org/jruby/jruby-dist/9.3.0.0-SNAPSHOT/

And every night the most recent of these is repackaged as a nightly build here:

https://github.com/ruby/jruby-dev-builder/releases/latest

We do intend to match CRuby's line numbers so that will get fixed... but perhaps this works well enough for CoverBand already?

If you see issues that are not due to multi-line statements' line numbers, please open a new bug.

@danmayer
Copy link

I will try to give this a go this weekend thanks

@danmayer
Copy link

I got this installed and was able to verify basic scripts and get most of Coverband passing.

I did get one error:

CollectorsDeltaTest#test_one_shot_lines_results:
Mocha::StubbingError: stubbing non-existent method: Coverage.line_stub

This is defined here in CRuby: https://github.com/ruby/ruby/blob/c5eb24349a4535948514fe765c3ddb0628d81004/ext/coverage/lib/coverage.rb

Given it is just Ruby, I can work around this, but just adding that extension when running JRuby

require 'coverband/utils/jruby_ext' if RUBY_PLATFORM == 'java'

but I wanted to let you know, as I am not sure if that is something you would want to just directly support @headius

@danmayer
Copy link

saw that you. wanted anything else in a new issue, so I created #6181

@headius
Copy link
Member

headius commented Apr 18, 2020

@danmayer Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants