Anonymous class-type being raised by Timeout/Queue in both 9x and 1x. #3834

digitalextremist · 2016-04-28T14:46:31Z

Linux 3.13.0-48-generic #80-Ubuntu SMP x86_64

Both...

jruby 1.7.22 (2.0.0p598) 2016-03-25 c28f492 on Java HotSpot(TM) 64-Bit Server VM 1.8.0_77-b03 +jit [linux-amd64]
jruby 9.0.5.0 (2.2.3) 2016-03-30 7bee00d Java HotSpot(TM) 64-Bit Server VM 25.77-b03 on 1.8.0_77-b03 +jit [linux-amd64]

Working with Celluloid, and the http.rb gem, using Timeout.timeout on HTTP calls; in specific cases I can always crash an actor because I am unable to catch the exception raised.

I expected Timeout.timeout to raise Timeout::Error ...not this:

E, [2016-04-28T05:11:24.836000 #8088] ERROR -- : Actor crashed!
#<Class:0x915809c>: execution expired
  org/jruby/ext/thread/Queue.java:152:in `pop'
  org/jruby/ext/thread/Queue.java:127:in `pop'
  org/jruby/ext/thread/SizedQueue.java:111:in `pop'
  org/jruby/ext/fiber/ThreadFiber.java:183:in `yield'
  org/jruby/ext/fiber/ThreadFiber.java:166:in `yield'
  /usr/local/rvm/gems/jruby-1.7.22/gems/celluloid-0.17.3/lib/celluloid/task/fibered.rb:21:in `signal'
  /usr/local/rvm/gems/jruby-1.7.22/gems/celluloid-0.17.3/lib/celluloid/task.rb:72:in `suspend'
  /usr/local/rvm/gems/jruby-1.7.22/gems/celluloid-0.17.3/lib/celluloid/task.rb:11:in `suspend'
  /usr/local/rvm/gems/jruby-1.7.22/gems/celluloid-0.17.3/lib/celluloid.rb:139:in `suspend'
  /usr/local/rvm/gems/jruby-1.7.22/gems/celluloid-0.17.3/lib/celluloid/call/sync.rb:41:in `response'
  /usr/local/rvm/gems/jruby-1.7.22/gems/celluloid-0.17.3/lib/celluloid/call/sync.rb:45:in `value'
  /usr/local/rvm/gems/jruby-1.7.22/gems/celluloid-0.17.3/lib/celluloid/proxy/sync.rb:22:in `method_missing'
  /usr/local/rvm/rubies/jruby-1.7.22/lib/ruby/1.9/forwardable.rb:201:in `get_json'
  /home/filter/filter/preprocessor/analyzer/tester.rb:60:in `grey_white?'
  /home/filter/filter/preprocessor/analyzer.rb:121:in `process'
  org/jruby/ext/timeout/Timeout.java:115:in `timeout'
  /home/filter/filter/preprocessor/analyzer.rb:52:in `process'
  org/jruby/RubyKernel.java:1930:in `public_send'
  /usr/local/rvm/gems/jruby-1.7.22/gems/celluloid-0.17.3/lib/celluloid/calls.rb:28:in `dispatch'
  /usr/local/rvm/gems/jruby-1.7.22/gems/celluloid-0.17.3/lib/celluloid/call/async.rb:7:in `dispatch'
  /usr/local/rvm/gems/jruby-1.7.22/gems/celluloid-0.17.3/lib/celluloid/cell.rb:50:in `dispatch'
  /usr/local/rvm/gems/jruby-1.7.22/gems/celluloid-0.17.3/lib/celluloid/cell.rb:76:in `task'
  /usr/local/rvm/gems/jruby-1.7.22/gems/celluloid-0.17.3/lib/celluloid/actor.rb:339:in `task'
  /usr/local/rvm/gems/jruby-1.7.22/gems/celluloid-0.17.3/lib/celluloid/task.rb:44:in `initialize'
  /usr/local/rvm/gems/jruby-1.7.22/gems/celluloid-0.17.3/lib/celluloid/task/fibered.rb:14:in `create'
E, [2016-04-28T05:11:24.946000 #8088] ERROR -- : thread crashed
#<Class:0x915809c>: execution expired
  org/jruby/ext/thread/Queue.java:152:in `pop'
  org/jruby/ext/thread/Queue.java:127:in `pop'
  org/jruby/ext/thread/SizedQueue.java:111:in `pop'
  org/jruby/ext/fiber/ThreadFiber.java:183:in `yield'
  org/jruby/ext/fiber/ThreadFiber.java:166:in `yield'
  /usr/local/rvm/gems/jruby-1.7.22/gems/celluloid-0.17.3/lib/celluloid/task/fibered.rb:21:in `signal'
  /usr/local/rvm/gems/jruby-1.7.22/gems/celluloid-0.17.3/lib/celluloid/task.rb:72:in `suspend'
  /usr/local/rvm/gems/jruby-1.7.22/gems/celluloid-0.17.3/lib/celluloid/task.rb:11:in `suspend'
  /usr/local/rvm/gems/jruby-1.7.22/gems/celluloid-0.17.3/lib/celluloid.rb:139:in `suspend'
  /usr/local/rvm/gems/jruby-1.7.22/gems/celluloid-0.17.3/lib/celluloid/call/sync.rb:41:in `response'
  /usr/local/rvm/gems/jruby-1.7.22/gems/celluloid-0.17.3/lib/celluloid/call/sync.rb:45:in `value'
  /usr/local/rvm/gems/jruby-1.7.22/gems/celluloid-0.17.3/lib/celluloid/proxy/sync.rb:22:in `method_missing'
  /usr/local/rvm/rubies/jruby-1.7.22/lib/ruby/1.9/forwardable.rb:201:in `get_json'
  /home/filter/filter/preprocessor/analyzer/tester.rb:60:in `grey_white?'
  /home/filter/filter/preprocessor/analyzer.rb:121:in `process'
  org/jruby/ext/timeout/Timeout.java:115:in `timeout'
  /home/filter/filter/preprocessor/analyzer.rb:52:in `process'
  org/jruby/RubyKernel.java:1930:in `public_send'
  /usr/local/rvm/gems/jruby-1.7.22/gems/celluloid-0.17.3/lib/celluloid/calls.rb:28:in `dispatch'
  /usr/local/rvm/gems/jruby-1.7.22/gems/celluloid-0.17.3/lib/celluloid/call/async.rb:7:in `dispatch'
  /usr/local/rvm/gems/jruby-1.7.22/gems/celluloid-0.17.3/lib/celluloid/cell.rb:50:in `dispatch'
  /usr/local/rvm/gems/jruby-1.7.22/gems/celluloid-0.17.3/lib/celluloid/cell.rb:76:in `task'
  /usr/local/rvm/gems/jruby-1.7.22/gems/celluloid-0.17.3/lib/celluloid/actor.rb:339:in `task'
  /usr/local/rvm/gems/jruby-1.7.22/gems/celluloid-0.17.3/lib/celluloid/task.rb:44:in `initialize'
  /usr/local/rvm/gems/jruby-1.7.22/gems/celluloid-0.17.3/lib/celluloid/task/fibered.rb:14:in `create'

You can see by the message that it's a timeout, but the class which is raised bearing that message and backtrace is not Timeout::Error.

I've tried everything to catch that class type. Including using this sort of rescue:

rescue StandardError, Exception, Timeout::Error, Celluloid::TaskTerminated => ex

... which ought to be redundant in the extreme, but the class still escapes capture.

Unfortunately, to post a reproducible piece of code I would need to extract some from client code, but also replicate conditions that I've isolated in a certain context:

This is no normal HTTP timeout, though it does happen in that case at times -- less reliably. But it always happens when we take all nodes out of a load balancer, and then hit the load balancer with a GET request.

The text was updated successfully, but these errors were encountered:

Partial fix for #3834 The main issue here is that we frequently return this for DynamicMethod#dup, rather than return a new instance, and then this gets modified by code expecting it to be something different. This breaks reflective code because many reflective methods will expect the methods in a module's method table to say their implementationClass is that module. The case in #3834 broke because the accessor methods in Struct have this "fake" dup logic, so when the struct class itself was duplicated, the methods got modified implementationClass = clone and seemed to "disappear" from the original. This is only a partial fix because it just fixes those accessors in Struct. The proper fix would be to make sure DynamicMethod#dup returns a real new object for all DynamicMethod (except "dead" ones like UndefinedMethod). This may also fix (or make easier to fix) issues like #2988, where a dup'ed module's methods do not reflect their new implementationClass. Specs pending.

digitalextremist · 2016-04-28T19:21:25Z

@headius here's the full backtrace as requested:

https://gist.github.com/digitalextremist/830d649bc2f65e6261c47eafa12578a9

headius · 2016-04-28T20:22:07Z

@digitalextremist Ok, yeah...the exception seems to be raised by someone else via our thread eventing. That's harder to investigate since we don't have logging of cross-thread exception raises right now.

The exception itself looks like it's probably the anonymous exception Timeout throws.

It's unusual to me that it's bubbling out of a yield. Timing out in yield would mean that fiber A resumes fiber B, fiber B yields back to A, and A times out and bubbles this exception back to B. That's an unusual direction for exceptions to flow...generally they only come out of resume, not out of yield.

headius · 2016-04-28T20:25:57Z

Ok, in order to continue on this I think we need a reproduction, or @digitalextremist will need to do some debugging locally.

A patch to start investigating this would be to add logging to ThreadFiber.exchangeWithFiber's catch block to print out all exceptions. I suspect it is propagating something the wrong direction.

headius · 2016-04-28T20:39:43Z

I remember there used to be a Thread-based mode for Celluloid too...perhaps it still exists and we can try it?

headius · 2016-04-28T20:52:01Z

Another place to debug would be RubyThread.receiveMail. This method is called when thread A wants to deliver a kill or raise event to thread B, and it runs in the context of thread A. We could see who is triggering this cross-thread (cross-fiber) exception then.

digitalextremist · 2016-04-28T23:34:55Z

@headius I was able to work around this for now by cutting out Timeout.timeout and using the HTTP timeouts :per_operation which I believe triggers a timeout on a socket-level, rather than using the infamous Timeout API.

Either way I remember the situation clearly and will try to come back with a repro that works. I think the key here in reproducing it reliably would be to have a TCPServer in the repro itself that opens a connection but then does nothing, so there's not an immediate error of connection refused or similar.

Thank you very much for looking through this. Our interactions caused me to think of the workaround, brought up things in Celluloid that are fruitful. Like I said, I'm definitely down to ship beer :)

headius · 2016-05-11T19:25:20Z

Still a bit of a mystery and unlikely to be investigated for the quick 9.1.1 release. Punting.

digitalextremist · 2016-05-13T10:52:41Z

@headius this resurfaced with Rubinius, except the class did bubble up in that case.

In that incarnation, it appears to be Mutex#synchronize being bombarded... timing out at this specific line:

https://github.com/celluloid/celluloid-essentials/blob/master/lib/celluloid/internals/thread_handle.rb#L16

Which makes sense, because here is what complains:

https://github.com/celluloid/celluloid/blob/master/lib/celluloid/group/spawner.rb#L52

headius · 2016-05-13T12:53:43Z

Perhaps I missed it, but what was the actual exception class that bubbled out?

digitalextremist · 2016-05-13T14:03:40Z

Pardon me, it's Timeout::Error

headius · 2016-05-13T15:02:55Z

Ok...so is this just getting through because nobody expects to rescue it? Is there anything we need to fix on the JRuby side? I assume since the same issue appeared on Rubinius, it may be an exception-handling gap in Celluloid...

digitalextremist · 2016-05-13T16:17:50Z

I'm troubleshooting the Celluloid end -- but the fact that the class being bounced on the jRuby side is an anonymous class still persists; although yes, something is falling through on the Celluloid side.

headius · 2016-05-16T22:25:51Z

Ok, I can at least investigate why the class is anonymous.

headius · 2016-05-17T00:06:36Z

It looks like the anonymous exception code is very old...no longer present in MRI's timeout.rb after August 2013. I will do a pass over timeout.rb and our version and try to ensure they match.

headius · 2016-05-17T00:27:17Z

Ok, so here's the deal.

If you provide a specific exception type to Timeout::timout, it will raise a new instance of that exception when the timer runs out. If you do not provide an error. It eventually raises Timeout::Error. I say eventually, because MRI's timeout.rb has used various other exceptions to actually unroll the timeout-protected code back to the code in timeout.rb that decides how to propagate the exception.

At one point, it did Class.new(Exception) every time you called for a timeout guard. This is the anonymous exception we emulate currently (inside an exception type called AnonymousException but with its name removed to match MRI.

Later, they realized the high cost of creating a new class every time and instead created a single ExitException type they used to unroll.

And most recently it appears they now simply raise Timeout::Error right away but post-process it a bit to scrub out the backtrace lines related to Timeout::timeout.

The original intent was that this should be an exception users never see and never rescue so it propagates all the way out to the timeout call.

Now, the fact that you saw this exception makes me think we're not properly handling it when it bubbles out, and it's continuing to propagate into user code. That could be a bug, or more likely it's a problem with the asynchronous nature of Timeout causing the anon exception to be raised outside the timeout call.

I will do some experimenting and see if I can get this to reproduce based on my theory.

Our native Timeout has not been updated in a long time, and for * A separate anonymous exception is not used to trigger the timeout anymore; instead, they use catch/throw. * The exception message string is pre-allocated and frozen. * Various other miscellaneous changes. This version has *not* been tested for performance, and it will initially be much slower than our scheduled executor-based version due to it always spinning up a timeout thread. But we can see how it goes in test suites and then evaluate how to move forward. Note that the change to test/jruby/test_timeout.rb was to bring behavior in line with MRI 2.3.

headius · 2016-05-17T19:47:22Z

I have pushed a new_timeout branch to my fork (which reverts us back to using MRI's timeout.rb) and opened #3895 to test it out. If you get a chance, you could try to reproduce your issues on a build from that branch.

I doubt will stick with unmodified timeout.rb since it has a few big performance issues, but this will at least let us know how well it integrates into the rest of the system.

Our native Timeout has not been updated in a long time, and for * A separate anonymous exception is not used to trigger the timeout anymore; instead, they use catch/throw. * The exception message string is pre-allocated and frozen. * Various other miscellaneous changes. This version has *not* been tested for performance, and it will initially be much slower than our scheduled executor-based version due to it always spinning up a timeout thread. But we can see how it goes in test suites and then evaluate how to move forward. Note that the change to test/jruby/test_timeout.rb was to bring behavior in line with MRI 2.3.

headius · 2016-08-22T17:34:55Z

Workarounds include not using timeout and using it with an explicit exception, so I think we can punt any remaining rewrite of timeout.rb (or re-incorporation of MRI's version, with bottlenecks fixed) to 9.1.4.0.

Fixes jruby#3834. * Use Timeout::Error instead of leaky anonymous exception. * Move executor to instance data from static. * Clean up redundant, outdated code. * Move sharable pieces back into timeout.rb. * Eliminate builtin and load ext from timeout.rb.

headius · 2018-10-11T16:36:44Z

I've pushed an updated implementation of Timeout that shares more code with MRI and always uses the same Timeout::Error exception rather than our weird anonymous one. I believe this will resolve this issue.

headius added this to the JRuby 9.1.1.0 milestone Apr 28, 2016

headius added core JRuby 1.7.x JRuby 9000 labels Apr 28, 2016

digitalextremist mentioned this issue May 9, 2016

Aborting with SIGSEGV; the sequel, twice-over. rubinius/rubinius#3653

Closed

headius modified the milestones: JRuby 9.1.2.0, JRuby 9.1.1.0 May 11, 2016

enebo modified the milestones: JRuby 9.1.2.0, JRuby 9.1.3.0 May 23, 2016

enebo modified the milestones: JRuby 9.1.3.0, JRuby 9.1.2.0 May 23, 2016

headius modified the milestones: JRuby 9.1.4.0, JRuby 9.1.3.0 Aug 22, 2016

enebo modified the milestones: JRuby 9.1.4.0, JRuby 9.1.5.0, JRuby 9.1.6.0 Sep 1, 2016

headius modified the milestones: JRuby 9.2.0.0, JRuby 9.1.6.0 Nov 8, 2016

dgolombek mentioned this issue Oct 23, 2017

Timeout during SSL connection should be treated as timeout in JRuby nahi/httpclient#381

Closed

headius modified the milestones: JRuby 9.2.0.0, JRuby 9.2.1.0 May 15, 2018

headius mentioned this issue Oct 11, 2018

Realign timeout with MRI. #5354

Merged

headius closed this as completed in #5354 Oct 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Sponsors

Anonymous class-type being raised by Timeout/Queue in both 9x and 1x. #3834

Anonymous class-type being raised by Timeout/Queue in both 9x and 1x. #3834

digitalextremist commented Apr 28, 2016 •

edited

Loading

digitalextremist commented Apr 28, 2016

headius commented Apr 28, 2016

headius commented Apr 28, 2016

headius commented Apr 28, 2016

headius commented Apr 28, 2016

digitalextremist commented Apr 28, 2016

headius commented May 11, 2016

digitalextremist commented May 13, 2016

headius commented May 13, 2016

digitalextremist commented May 13, 2016

headius commented May 13, 2016

digitalextremist commented May 13, 2016

headius commented May 16, 2016

headius commented May 17, 2016

headius commented May 17, 2016

headius commented May 17, 2016

headius commented Aug 22, 2016

headius commented Oct 11, 2018

Anonymous class-type being raised by Timeout/Queue in both 9x and 1x. #3834

Anonymous class-type being raised by Timeout/Queue in both 9x and 1x. #3834

Comments

digitalextremist commented Apr 28, 2016 • edited Loading

digitalextremist commented Apr 28, 2016

headius commented Apr 28, 2016

headius commented Apr 28, 2016

headius commented Apr 28, 2016

headius commented Apr 28, 2016

digitalextremist commented Apr 28, 2016

headius commented May 11, 2016

digitalextremist commented May 13, 2016

headius commented May 13, 2016

digitalextremist commented May 13, 2016

headius commented May 13, 2016

digitalextremist commented May 13, 2016

headius commented May 16, 2016

headius commented May 17, 2016

headius commented May 17, 2016

headius commented May 17, 2016

headius commented Aug 22, 2016

headius commented Oct 11, 2018

digitalextremist commented Apr 28, 2016 •

edited

Loading