Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lock is not released at Kernel.require when Gem.try_activate throws RuntimeException #3652

Closed
frsyuki opened this issue Feb 9, 2016 · 8 comments

Comments

@frsyuki
Copy link
Contributor

frsyuki commented Feb 9, 2016

Issues

  • (a) require defined at stdlib/rubygems/core_ext/kernel_require.rb locks a monitor but doesn't release it if Gem.try_active throws a RuntimeException.
  • (b) It throws RuntimeException at Gem.active with "can't add a new key into hash during iteration" message if another thread is iterating Gem.loaded_specs.

Details

My application is using JRuby. I found that many threads are blocked at the same method (full stacktrace is at https://gist.github.com/frsyuki/17601f3b1483adbd6019#file-sigdump-3892-log ):

Thread #<Thread:0x33eb4afe> status=sleep priority=0
    uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/monitor.rb:185:in `lock'
    uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/monitor.rb:185:in `mon_enter'
    uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:39:in `require'
    uri:classloader:/embulk/guess_plugin.rb:89:in `guess'
    uri:classloader:/embulk/guess_plugin.rb:24:in `guess'
  In Java "guess" Id=100 WAITING on java.util.concurrent.locks.ReentrantLock$NonfairSync@6d71dcba owned by "qtp54440148-21" Id=21
      at sun.misc.Unsafe.park(Native Method)
      -  waiting on java.util.concurrent.locks.ReentrantLock$NonfairSync@6d71dcba
      at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireInterruptibly(AbstractQueuedSynchronizer.java:897)
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1222)
      at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335)
      at org.jruby.RubyThread.lockInterruptibly(RubyThread.java:1844)
      at org.jruby.ext.thread.Mutex.lock(Mutex.java:91)
      ...

I got this Ruby stacktrace using sigdump.

Then, I found following error happened 6 hours ago (full stacktrace is at https://gist.github.com/frsyuki/17601f3b1483adbd6019#file-error1-log-L70 )

...
Caused by: org.jruby.exceptions.RaiseException: (RuntimeError) can't add a new key into hash during iteration
    at org.jruby.RubyHash.[]=(org/jruby/RubyHash.java:1006)
    at RUBY.activate(uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/specification.rb:1285)
    at RUBY.try_activate(uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems.rb:196)
    at uri_3a_classloader_3a_.META_minus_INF.jruby_dot_home.lib.ruby.stdlib.rubygems.core_ext.kernel_require.require(uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:126)
    at RUBY.require_and_show(uri:classloader:/embulk/plugin_registry.rb:79)
    at RUBY.search(uri:classloader:/embulk/plugin_registry.rb:39)
    at uri_3a_classloader_3a_.embulk.plugin_registry.lookup(uri:classloader:/embulk/plugin_registry.rb:26)
    at uri_3a_classloader_3a_.embulk.plugin.lookup(uri:classloader:/embulk/plugin.rb:197)
    at RUBY.new_java_parser(uri:classloader:/embulk/plugin.rb:167)
    at RUBY.new_java_parser(uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/forwardable.rb:183)

This error happened at lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:126. It assumes that Gem.try_activate doesn't throw exception and doesn't release acquired lock if an error happened.

I'm not sure which thread was iterating Gem.loaded_specs object. But it seems a public API which could be called by arbitrary code.

Environment

  • JRuby version: 9.0.0.0
  • Java version: 1.8.0_51 64-Bit Server VM
@frsyuki
Copy link
Contributor Author

frsyuki commented Feb 9, 2016

I think that improving lock code at require method to be more defensive will solve deadlock. But it still causes RuntimeError. Serializing all access to Gem.loaded_specs object, or public API of it returns copy of @loaded_spec probably solves the entire problem but I don't know that is a possible. Any ideas are there?

frsyuki added a commit to embulk/embulk that referenced this issue Feb 9, 2016
Serializes access to Gem.loaded_specs to mitigate probability of
possible deadlock of jruby/jruby#3652.
frsyuki added a commit to embulk/embulk that referenced this issue Feb 9, 2016
Serializes access to Gem.loaded_specs to mitigate probability of
possible deadlock of jruby/jruby#3652.
@headius
Copy link
Member

headius commented Feb 13, 2016

So is this a bug in RubyGems then? The locks in question are in kernel_require and that's not something we modify; it's just stock RubyGems. Perhaps this needs to be reported to them.

It appears that in a very recent RubyGems on our ruby-2.3 branch, it's still not quite right.

@headius
Copy link
Member

headius commented Feb 13, 2016

Here's a patch I think will work. @frsyuki if you could test this patch, all that's left is reporting to RubyGems and getting a release we can use.

diff --git a/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb b/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb
index f9b67ea..accbf53 100644
--- a/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb
+++ b/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb
@@ -120,14 +120,17 @@ module Kernel
   rescue LoadError => load_error
     RUBYGEMS_ACTIVATION_MONITOR.enter

-    if load_error.message.start_with?("Could not find") or
-        (load_error.message.end_with?(path) and Gem.try_activate(path)) then
-      RUBYGEMS_ACTIVATION_MONITOR.exit
-      return gem_original_require(path)
-    else
+    begin
+      if load_error.message.start_with?("Could not find") or
+          (load_error.message.end_with?(path) and Gem.try_activate(path)) then
+        require_again = true
+      end
+    ensure
       RUBYGEMS_ACTIVATION_MONITOR.exit
     end

+    return gem_original_require(path) if require_again
+
     raise load_error
   end

@frsyuki
Copy link
Contributor Author

frsyuki commented Feb 14, 2016

@headius thank you. I'll test that patch.

@frsyuki
Copy link
Contributor Author

frsyuki commented Feb 22, 2016

@headius I tried above patch for 1 week with @muga. Deadlock is not happening at least for now. Thank you!
I believe we can say that this patch solves the deadlock problem. So, the worst case is fixed but I think it's not 100% fix because require fails once.
Should I move this discussion to rubygems community, or CRuby maybe?

@headius
Copy link
Member

headius commented Mar 4, 2016

I think we should move this discussion to a RubyGems issue. This appears to be their bug.

headius added a commit to headius/rubygems that referenced this issue Mar 7, 2016
See jruby/jruby#3652.

I'm not sure how best to test this; the original report has a
rather complicated env setup not suitable for RG test suite.

There may be locking required around the iteration of
Gem.loaded_specs, in case parallel gem activation is attempting to
mutate that structure concurrently.
@headius
Copy link
Member

headius commented Mar 7, 2016

I have filed rubygems/rubygems#1538. We'll leave this open until we can update 1.7 and 9k to the fixed version.

homu added a commit to rubygems/rubygems that referenced this issue Mar 9, 2016
Ensure we unlock the monitor even if try_activate throws.

# Description:

Not all paths ensured the monitor was unlocked. Specifically `try_activat`e could throw here (possibly exposing an internal bug, but still) and leave the monitor locked, preventing any future requires from entering the Gem require. This patch modifies the `try_activate` logic to always unlock the monitor, success or fail.

See jruby/jruby#3652.

I'm not sure how best to test this; the original report has a
rather complicated env setup not suitable for RG test suite.

There may be locking required around the iteration of
Gem.loaded_specs, in case parallel gem activation is attempting to
mutate that structure concurrently.

# Tasks:

- [x] Describe the problem / feature
- [x] Write tests

I'm not sure how to write a test for this. We'd have to get `try_activate` to raise.

- [x] Write code to solve the problem
- [ ] Get code review from coworkers / friends
- [x] [Squash commits](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html)

I will abide by the [code of conduct](https://github.com/rubygems/rubygems/blob/master/CODE_OF_CONDUCT.md).
homu added a commit to rubygems/rubygems that referenced this issue Mar 9, 2016
Ensure we unlock the monitor even if try_activate throws.

# Description:

Not all paths ensured the monitor was unlocked. Specifically `try_activat`e could throw here (possibly exposing an internal bug, but still) and leave the monitor locked, preventing any future requires from entering the Gem require. This patch modifies the `try_activate` logic to always unlock the monitor, success or fail.

See jruby/jruby#3652.

I'm not sure how best to test this; the original report has a
rather complicated env setup not suitable for RG test suite.

There may be locking required around the iteration of
Gem.loaded_specs, in case parallel gem activation is attempting to
mutate that structure concurrently.

# Tasks:

- [x] Describe the problem / feature
- [x] Write tests

I'm not sure how to write a test for this. We'd have to get `try_activate` to raise.

- [x] Write code to solve the problem
- [ ] Get code review from coworkers / friends
- [x] [Squash commits](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html)

I will abide by the [code of conduct](https://github.com/rubygems/rubygems/blob/master/CODE_OF_CONDUCT.md).
homu added a commit to rubygems/rubygems that referenced this issue Mar 10, 2016
Ensure we unlock the monitor even if try_activate throws.

# Description:

Not all paths ensured the monitor was unlocked. Specifically `try_activat`e could throw here (possibly exposing an internal bug, but still) and leave the monitor locked, preventing any future requires from entering the Gem require. This patch modifies the `try_activate` logic to always unlock the monitor, success or fail.

See jruby/jruby#3652.

I'm not sure how best to test this; the original report has a
rather complicated env setup not suitable for RG test suite.

There may be locking required around the iteration of
Gem.loaded_specs, in case parallel gem activation is attempting to
mutate that structure concurrently.

# Tasks:

- [x] Describe the problem / feature
- [x] Write tests

I'm not sure how to write a test for this. We'd have to get `try_activate` to raise.

- [x] Write code to solve the problem
- [ ] Get code review from coworkers / friends
- [x] [Squash commits](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html)

I will abide by the [code of conduct](https://github.com/rubygems/rubygems/blob/master/CODE_OF_CONDUCT.md).
homu added a commit to rubygems/rubygems that referenced this issue Mar 11, 2016
Ensure we unlock the monitor even if try_activate throws.

# Description:

Not all paths ensured the monitor was unlocked. Specifically `try_activat`e could throw here (possibly exposing an internal bug, but still) and leave the monitor locked, preventing any future requires from entering the Gem require. This patch modifies the `try_activate` logic to always unlock the monitor, success or fail.

See jruby/jruby#3652.

I'm not sure how best to test this; the original report has a
rather complicated env setup not suitable for RG test suite.

There may be locking required around the iteration of
Gem.loaded_specs, in case parallel gem activation is attempting to
mutate that structure concurrently.

# Tasks:

- [x] Describe the problem / feature
- [x] Write tests

I'm not sure how to write a test for this. We'd have to get `try_activate` to raise.

- [x] Write code to solve the problem
- [ ] Get code review from coworkers / friends
- [x] [Squash commits](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html)

I will abide by the [code of conduct](https://github.com/rubygems/rubygems/blob/master/CODE_OF_CONDUCT.md).
indirect pushed a commit to rubygems/rubygems that referenced this issue Mar 11, 2016
See jruby/jruby#3652.

I'm not sure how best to test this; the original report has a
rather complicated env setup not suitable for RG test suite.

There may be locking required around the iteration of
Gem.loaded_specs, in case parallel gem activation is attempting to
mutate that structure concurrently.
homu added a commit to rubygems/rubygems that referenced this issue Mar 11, 2016
Ensure we unlock the monitor even if try_activate throws.

# Description:

Not all paths ensured the monitor was unlocked. Specifically `try_activat`e could throw here (possibly exposing an internal bug, but still) and leave the monitor locked, preventing any future requires from entering the Gem require. This patch modifies the `try_activate` logic to always unlock the monitor, success or fail.

See jruby/jruby#3652.

I'm not sure how best to test this; the original report has a
rather complicated env setup not suitable for RG test suite.

There may be locking required around the iteration of
Gem.loaded_specs, in case parallel gem activation is attempting to
mutate that structure concurrently.

# Tasks:

- [x] Describe the problem / feature
- [x] Write tests

I'm not sure how to write a test for this. We'd have to get `try_activate` to raise.

- [x] Write code to solve the problem
- [ ] Get code review from coworkers / friends
- [x] [Squash commits](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html)

I will abide by the [code of conduct](https://github.com/rubygems/rubygems/blob/master/CODE_OF_CONDUCT.md).
@headius
Copy link
Member

headius commented Mar 14, 2016

Fixed by RG 2.6.2 in fb71835. See rubygems/rubygems#1538 for details.

@headius headius closed this as completed Mar 14, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants