Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

autoload & require deadlock #3341

Closed
bignacio opened this issue Sep 21, 2015 · 12 comments
Closed

autoload & require deadlock #3341

bignacio opened this issue Sep 21, 2015 · 12 comments
Assignees
Milestone

Comments

@bignacio
Copy link

I run into an interesting problem when using require and autoload for the same classes.

I only tested it against JRuby 1.7.21 and quickly looking at the code I'd attributed the problem to combination of a global lock for autoload consts and the require lock.

Since autoload at some point will call require and try to acquire a lock for a file, that same file can contain autoloaded classes that in turn will need to acquire the autoload lock.

I wrote a little, perhaps convoluted, sample code to reproduce the problem. It is gisted here https://gist.github.com/bignacio/7a947960113775477c72 and I copy it below

Save all the files into the same directory and run

$ jruby main.rb

as any good deadlock, it won't happen all the time so you can do something like this

$ while [ true ] ; do jruby main.rb ; done

and wait untill it hangs.

You can confirm the dealock with jstack
Once the process is hung, get its pid and run

jstack -l < pid >

# another_class_base.rb
module DeadlockTest
  class AnotherClassBase
  end
end

# file:  class_a.rb
module DeadlockTest
  autoload :AnotherClassBase, 'another_class_base'
  class ClassA
    class AnotherClass < AnotherClassBase
    end
  end
end

# file:  deadlock_test.rb
module DeadlockTest
  autoload :ClassA, 'class_a'
end

# file: main.rb
require 'thread'
require 'deadlock_test'

puts("Starting, JRuby version: #{JRUBY_VERSION}")
ta = Thread.new{
  puts("Thread A #{JRuby.reference(Thread.current).native_thread}\n")

  # This is not a usual thing for people to do but it can happen if an external component is loading
  # your ruby code
  # I first run into this problem in a storm topology (via RedStorm), which does
  # a similar thing. I'm sure there are other ways to reproduce the same problem
  # without a require but this
  require 'class_a'
}

tb = Thread.new{
  puts("Thread B #{JRuby.reference(Thread.current).native_thread}\n")

  o = DeadlockTest::ClassA.new
  puts('if you see this, try again :) ')
}

ta.join
tb.join

jstack output

Found one Java-level deadlock:
=============================
"Ruby-0-Thread-3: main.rb:1":
  waiting for ownable synchronizer 0x00000007b65c1c40, (a java.util.concurrent.locks.ReentrantLock$NonfairSync),
  which is held by "Ruby-0-Thread-2: main.rb:1"
"Ruby-0-Thread-2: main.rb:1":
  waiting to lock monitor 0x00007f84e9014b38 (object 0x00000007b6408c78, a java.lang.Object),
  which is held by "Ruby-0-Thread-3: main.rb:1"

Java stack information for the threads listed above:
===================================================
"Ruby-0-Thread-3: main.rb:1":
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000007b65c1c40> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
    at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
    at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
    at org.jruby.runtime.load.LoadService$RequireLocks.lock(LoadService.java:509)
    at org.jruby.runtime.load.LoadService$RequireLocks.access$200(LoadService.java:478)
    at org.jruby.runtime.load.LoadService.requireCommon(LoadService.java:444)
    at org.jruby.runtime.load.LoadService.autoloadRequire(LoadService.java:418)
    at org.jruby.RubyKernel$1.load(RubyKernel.java:202)
    at org.jruby.RubyModule$Autoload.getConstant(RubyModule.java:3832)
    - locked <0x00000007b6408c78> (a java.lang.Object)
    at org.jruby.RubyModule.getAutoloadConstant(RubyModule.java:3621)
    at org.jruby.RubyModule.resolveUndefConstant(RubyModule.java:3108)
    at org.jruby.RubyModule.getConstantFromNoConstMissing(RubyModule.java:3076)
    at org.jruby.ast.executable.RuntimeCache.reCacheFrom(RuntimeCache.java:442)
    at org.jruby.ast.executable.RuntimeCache.getValueFrom(RuntimeCache.java:436)
    at org.jruby.ast.executable.RuntimeCache.getConstantFrom(RuntimeCache.java:429)
    at org.jruby.ast.executable.AbstractScript.getConstantFrom8(AbstractScript.java:336)
    at main.block_1$RUBY$__file__(main.rb:19)
    at main$block_1$RUBY$__file__.call(main$block_1$RUBY$__file__)
    at org.jruby.runtime.CompiledBlock19.yield(CompiledBlock19.java:159)
    at org.jruby.runtime.CompiledBlock19.call(CompiledBlock19.java:87)
    at org.jruby.runtime.Block.call(Block.java:101)
    at org.jruby.RubyProc.call(RubyProc.java:290)
    at org.jruby.RubyProc.call(RubyProc.java:228)
    at org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:99)
    at java.lang.Thread.run(Thread.java:745)
"Ruby-0-Thread-2: main.rb:1":
    at org.jruby.RubyModule$Autoload.getConstant(RubyModule.java:3825)
    - waiting to lock <0x00000007b6408c78> (a java.lang.Object)
    at org.jruby.RubyModule.getAutoloadConstant(RubyModule.java:3621)
    at org.jruby.RubyModule.resolveUndefConstant(RubyModule.java:3108)
    at org.jruby.RubyModule.getConstantAtSpecial(RubyModule.java:2968)
    at org.jruby.RubyModule.defineOrGetClassUnder(RubyModule.java:1281)
    at org.jruby.ast.ClassNode.interpret(ClassNode.java:134)
    at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
    at org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
    at org.jruby.evaluator.ASTInterpreter.INTERPRET_CLASS(ASTInterpreter.java:103)
    at org.jruby.evaluator.ASTInterpreter.evalClassDefinitionBody(ASTInterpreter.java:280)
    at org.jruby.ast.ModuleNode.interpret(ModuleNode.java:120)
    at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
    at org.jruby.ast.RootNode.interpret(RootNode.java:129)
    at org.jruby.evaluator.ASTInterpreter.INTERPRET_ROOT(ASTInterpreter.java:121)
    at org.jruby.Ruby.runInterpreter(Ruby.java:894)
    at org.jruby.Ruby.loadFile(Ruby.java:2860)
    at org.jruby.runtime.load.ExternalScript.load(ExternalScript.java:66)
    at org.jruby.runtime.load.LoadService.tryLoadingLibraryOrScript(LoadService.java:892)
    at org.jruby.runtime.load.LoadService.requireCommon(LoadService.java:465)
    at org.jruby.runtime.load.LoadService.require(LoadService.java:414)
    at org.jruby.RubyKernel.requireCommon(RubyKernel.java:1047)
    at org.jruby.RubyKernel.require19(RubyKernel.java:1040)
    at org.jruby.RubyKernel$INVOKER$s$1$0$require19.call(RubyKernel$INVOKER$s$1$0$require19.gen)
    at org.jruby.internal.runtime.methods.JavaMethod$JavaMethodOneOrNBlock.call(JavaMethod.java:352)
    at org.jruby.internal.runtime.methods.AliasMethod.call(AliasMethod.java:61)
    at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:168)
    at org.jruby.ast.FCallOneArgNode.interpret(FCallOneArgNode.java:36)
    at org.jruby.ast.ReturnNode.interpret(ReturnNode.java:92)
    at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
    at org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
    at org.jruby.ast.IfNode.interpret(IfNode.java:116)
    at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
    at org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
    at org.jruby.ast.RescueNode.executeBody(RescueNode.java:221)
    at org.jruby.ast.RescueNode.interpret(RescueNode.java:116)
    at org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
    at org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:182)
    at org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:203)
    at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:326)
    at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:170)
    at main.block_0$RUBY$__file__(main.rb:13)
    at main$block_0$RUBY$__file__.call(main$block_0$RUBY$__file__)
    at org.jruby.runtime.CompiledBlock19.yield(CompiledBlock19.java:159)
    at org.jruby.runtime.CompiledBlock19.call(CompiledBlock19.java:87)
    at org.jruby.runtime.Block.call(Block.java:101)
    at org.jruby.RubyProc.call(RubyProc.java:290)
    at org.jruby.RubyProc.call(RubyProc.java:228)
    at org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:99)
    at java.lang.Thread.run(Thread.java:745)

Found 1 deadlock.
@bignacio bignacio changed the title autoload & require dealock autoload & require deadlock Sep 21, 2015
@mkristian
Copy link
Member

I can reproduce it with jruby-1.7.x but so far did not see it in jruby-9k

@mkristian
Copy link
Member

also on jruby-9k but needed quite a few trials :)

@bignacio
Copy link
Author

There's more, I was able to create a similar problem without autoloads. The require lock itself allows for deadlocks

#file: dltest.rb
require 'thread'

ta = Thread.new { 
  require './class_a'
  require './class_b'
}

tb = Thread.new {
  require './class_b'
  require './class_a'
}

ta.join
tb.join

puts('Try again')

#file: class_a.rb
require './class_b'
require './class_c'

class ClassA
  def initialize
    @d = ClassC.new
  end
end

#file: class_b.rb
require './class_a'
require './class_c'

class ClassB
  def initialize
    @d = ClassC.new
  end
end

#file: class_c.rb
class ClassC
  def initialize

  end
end
# this sleep makes it easier to repro the problem
sleep(1)

run it in the same stubborn way

while [ true ] ; do jruby dltest.rb ; done

this is the jstack output for this test

Found one Java-level deadlock:
=============================
"Ruby-0-Thread-3: dltest.rb:1":
  waiting for ownable synchronizer 0x00000007b6475aa0, (a java.util.concurrent.locks.ReentrantLock$NonfairSync),
  which is held by "Ruby-0-Thread-2: dltest.rb:1"
"Ruby-0-Thread-2: dltest.rb:1":
  waiting for ownable synchronizer 0x00000007b6519890, (a java.util.concurrent.locks.ReentrantLock$NonfairSync),
  which is held by "Ruby-0-Thread-3: dltest.rb:1"

Java stack information for the threads listed above:
===================================================
"Ruby-0-Thread-3: dltest.rb:1":
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000007b6475aa0> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
    at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
    at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
    at org.jruby.runtime.load.LoadService$RequireLocks.lock(LoadService.java:509)
    at org.jruby.runtime.load.LoadService$RequireLocks.access$200(LoadService.java:478)
    at org.jruby.runtime.load.LoadService.requireCommon(LoadService.java:444)
    at org.jruby.runtime.load.LoadService.require(LoadService.java:414)
    at org.jruby.RubyKernel.requireCommon(RubyKernel.java:1047)
    at org.jruby.RubyKernel.require19(RubyKernel.java:1040)
    at org.jruby.RubyKernel$INVOKER$s$1$0$require19.call(RubyKernel$INVOKER$s$1$0$require19.gen)
    at org.jruby.internal.runtime.methods.JavaMethod$JavaMethodOneOrNBlock.call(JavaMethod.java:352)
    at org.jruby.internal.runtime.methods.AliasMethod.call(AliasMethod.java:61)
    at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:168)
    at org.jruby.ast.FCallOneArgNode.interpret(FCallOneArgNode.java:36)
    at org.jruby.ast.ReturnNode.interpret(ReturnNode.java:92)
    at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
    at org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
    at org.jruby.ast.IfNode.interpret(IfNode.java:116)
    at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
    at org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
    at org.jruby.ast.RescueNode.executeBody(RescueNode.java:221)
    at org.jruby.ast.RescueNode.interpret(RescueNode.java:116)
    at org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
    at org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:182)
    at org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:203)
    at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:326)
    at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:170)
    at org.jruby.ast.FCallOneArgNode.interpret(FCallOneArgNode.java:36)
    at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
    at org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
    at org.jruby.ast.RootNode.interpret(RootNode.java:129)
    at org.jruby.evaluator.ASTInterpreter.INTERPRET_ROOT(ASTInterpreter.java:121)
    at org.jruby.Ruby.runInterpreter(Ruby.java:894)
    at org.jruby.Ruby.loadFile(Ruby.java:2860)
    at org.jruby.runtime.load.LibrarySearcher$ResourceLibrary.loadScript(LibrarySearcher.java:264)
    at org.jruby.runtime.load.LibrarySearcher$ResourceLibrary.load(LibrarySearcher.java:252)
    at org.jruby.runtime.load.LibrarySearcher$FoundLibrary.load(LibrarySearcher.java:51)
    at org.jruby.runtime.load.LoadService.tryLoadingLibraryOrScript(LoadService.java:892)
    at org.jruby.runtime.load.LoadService.requireCommon(LoadService.java:465)
    at org.jruby.runtime.load.LoadService.require(LoadService.java:414)
    at org.jruby.RubyKernel.requireCommon(RubyKernel.java:1047)
    at org.jruby.RubyKernel.require19(RubyKernel.java:1040)
    at org.jruby.RubyKernel$INVOKER$s$1$0$require19.call(RubyKernel$INVOKER$s$1$0$require19.gen)
    at org.jruby.internal.runtime.methods.JavaMethod$JavaMethodOneOrNBlock.call(JavaMethod.java:352)
    at org.jruby.internal.runtime.methods.AliasMethod.call(AliasMethod.java:61)
    at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:168)
    at org.jruby.ast.FCallOneArgNode.interpret(FCallOneArgNode.java:36)
    at org.jruby.ast.ReturnNode.interpret(ReturnNode.java:92)
    at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
    at org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
    at org.jruby.ast.IfNode.interpret(IfNode.java:116)
    at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
    at org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
    at org.jruby.ast.RescueNode.executeBody(RescueNode.java:221)
    at org.jruby.ast.RescueNode.interpret(RescueNode.java:116)
    at org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
    at org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:182)
    at org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:203)
    at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:326)
    at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:170)
    at dltest.block_1$RUBY$__file__(dltest.rb:9)
    at dltest$block_1$RUBY$__file__.call(dltest$block_1$RUBY$__file__)
    at org.jruby.runtime.CompiledBlock19.yield(CompiledBlock19.java:159)
    at org.jruby.runtime.CompiledBlock19.call(CompiledBlock19.java:87)
    at org.jruby.runtime.Block.call(Block.java:101)
    at org.jruby.RubyProc.call(RubyProc.java:290)
    at org.jruby.RubyProc.call(RubyProc.java:228)
    at org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:99)
    at java.lang.Thread.run(Thread.java:745)
"Ruby-0-Thread-2: dltest.rb:1":
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000007b6519890> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
    at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
    at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
    at org.jruby.runtime.load.LoadService$RequireLocks.lock(LoadService.java:509)
    at org.jruby.runtime.load.LoadService$RequireLocks.access$200(LoadService.java:478)
    at org.jruby.runtime.load.LoadService.requireCommon(LoadService.java:444)
    at org.jruby.runtime.load.LoadService.require(LoadService.java:414)
    at org.jruby.RubyKernel.requireCommon(RubyKernel.java:1047)
    at org.jruby.RubyKernel.require19(RubyKernel.java:1040)
    at org.jruby.RubyKernel$INVOKER$s$1$0$require19.call(RubyKernel$INVOKER$s$1$0$require19.gen)
    at org.jruby.internal.runtime.methods.JavaMethod$JavaMethodOneOrNBlock.call(JavaMethod.java:352)
    at org.jruby.internal.runtime.methods.AliasMethod.call(AliasMethod.java:61)
    at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:168)
    at org.jruby.ast.FCallOneArgNode.interpret(FCallOneArgNode.java:36)
    at org.jruby.ast.ReturnNode.interpret(ReturnNode.java:92)
    at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
    at org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
    at org.jruby.ast.IfNode.interpret(IfNode.java:116)
    at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
    at org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
    at org.jruby.ast.RescueNode.executeBody(RescueNode.java:221)
    at org.jruby.ast.RescueNode.interpret(RescueNode.java:116)
    at org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
    at org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:182)
    at org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:203)
    at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:326)
    at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:170)
    at org.jruby.ast.FCallOneArgNode.interpret(FCallOneArgNode.java:36)
    at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
    at org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
    at org.jruby.ast.RootNode.interpret(RootNode.java:129)
    at org.jruby.evaluator.ASTInterpreter.INTERPRET_ROOT(ASTInterpreter.java:121)
    at org.jruby.Ruby.runInterpreter(Ruby.java:894)
    at org.jruby.Ruby.loadFile(Ruby.java:2860)
    at org.jruby.runtime.load.LibrarySearcher$ResourceLibrary.loadScript(LibrarySearcher.java:264)
    at org.jruby.runtime.load.LibrarySearcher$ResourceLibrary.load(LibrarySearcher.java:252)
    at org.jruby.runtime.load.LibrarySearcher$FoundLibrary.load(LibrarySearcher.java:51)
    at org.jruby.runtime.load.LoadService.tryLoadingLibraryOrScript(LoadService.java:892)
    at org.jruby.runtime.load.LoadService.requireCommon(LoadService.java:465)
    at org.jruby.runtime.load.LoadService.require(LoadService.java:414)
    at org.jruby.RubyKernel.requireCommon(RubyKernel.java:1047)
    at org.jruby.RubyKernel.require19(RubyKernel.java:1040)
    at org.jruby.RubyKernel$INVOKER$s$1$0$require19.call(RubyKernel$INVOKER$s$1$0$require19.gen)
    at org.jruby.internal.runtime.methods.JavaMethod$JavaMethodOneOrNBlock.call(JavaMethod.java:352)
    at org.jruby.internal.runtime.methods.AliasMethod.call(AliasMethod.java:61)
    at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:168)
    at org.jruby.ast.FCallOneArgNode.interpret(FCallOneArgNode.java:36)
    at org.jruby.ast.ReturnNode.interpret(ReturnNode.java:92)
    at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
    at org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
    at org.jruby.ast.IfNode.interpret(IfNode.java:116)
    at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
    at org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
    at org.jruby.ast.RescueNode.executeBody(RescueNode.java:221)
    at org.jruby.ast.RescueNode.interpret(RescueNode.java:116)
    at org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
    at org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:182)
    at org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:203)
    at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:326)
    at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:170)
    at dltest.block_0$RUBY$__file__(dltest.rb:4)
    at dltest$block_0$RUBY$__file__.call(dltest$block_0$RUBY$__file__)
    at org.jruby.runtime.CompiledBlock19.yield(CompiledBlock19.java:159)
    at org.jruby.runtime.CompiledBlock19.call(CompiledBlock19.java:87)
    at org.jruby.runtime.Block.call(Block.java:101)
    at org.jruby.RubyProc.call(RubyProc.java:290)
    at org.jruby.RubyProc.call(RubyProc.java:228)
    at org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:99)
    at java.lang.Thread.run(Thread.java:745)

Found 1 deadlock.

@mkristian
Copy link
Member

@bignacio very much appreciated the new reproduction case. I do have some fix for the first deadlock BUT I do see some IllegalMonitorStateException from time to time.

mkristian added a commit that referenced this issue Oct 5, 2015
this eventually fixes #3341 in the sense that both testcases
do not produce deadlocks anymore.

Sponsored by Lookout Inc.
@headius
Copy link
Member

headius commented Oct 5, 2015

The general case of two threads requiring two files in opposite order can't be fixed. This is a flaw in how Ruby locks around files, and it's possible to deadlock CRuby in the same way. The only fix for that would be to have a single global lock for all requires, serializing them across all threads.

I suspect that the autoload deadlock may just be another flavor of the same problem.

I'm looking over the fixes @mkristian made in #3370 and I am confused about a few things:

  • Switching to a reentrant lock should not make any difference; Java's object monitor synchronization is already reentrant.
  • The logic appears to spin for a few attempts to lock the ctxLock, but I don't see what it does if it is unable to acquire the lock during that time.

If the spinning eventually just gives up, that's not proper semantics because the file in question has not been loaded. I suspect it "fixes" the deadlock because the failed thread will now back all the way out of its locks, allowing the other thread to proceed.

@mkristian
Copy link
Member

running those two "test" with MRI there are no deadlocks but with warning enabled you see

/usr/local/Cellar/ruby/2.2.3/lib/ruby/2.2.0/rubygems/core_ext/kernel_require.rb:54: warning: loading in progress, circular require considered harmful - /Users/cmeier/projects/active/jruby/deadlock/test2/class_a.rb
    from dltest.rb:10:in  `block in <main>'
    from /usr/local/Cellar/ruby/2.2.3/lib/ruby/2.2.0/rubygems/core_ext/kernel_require.rb:54:in  `require'
    from /usr/local/Cellar/ruby/2.2.3/lib/ruby/2.2.0/rubygems/core_ext/kernel_require.rb:54:in  `require'
    from /Users/cmeier/projects/active/jruby/deadlock/test2/class_b.rb:2:in  `<top (required)>'
    from /usr/local/Cellar/ruby/2.2.3/lib/ruby/2.2.0/rubygems/core_ext/kernel_require.rb:54:in  `require'
    from /usr/local/Cellar/ruby/2.2.3/lib/ruby/2.2.0/rubygems/core_ext/kernel_require.rb:54:in  `require'

i.e. warning: loading in progress, circular require considered harmful for both examples.

with #3370 jruby produces:

/Users/cmeier/projects/active/jruby/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:54: warning: loading in progress, circular require considered harmful - /Users/cmeier/projects/active/jruby/deadlock/test2/class_c.rb
             require at org/jruby/RubyKernel.java:939
             require at /Users/cmeier/projects/active/jruby/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:54
               <top> at /Users/cmeier/projects/active/jruby/deadlock/test2/class_b.rb:3
             require at org/jruby/RubyKernel.java:939
              (root) at /Users/cmeier/projects/active/jruby/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:1
  block in dltest.rb at dltest.rb:10

I learned that this deadlock situation can not be solved without having a global require lock - but MRI needs to be first doing it.

@bignacio probably a "wont fix" but for me getting closer to MRI would help.

@bignacio
Copy link
Author

bignacio commented Oct 9, 2015

@mkristian Thanks for looking into this.

I'd risk proposing a scenario where a deadlock could happen even with a global require lock - given the fact you can require anytime, anywhere in your code.

# file1.rb
def do_something
  some_global_mutex.synchronize {
    require 'file2'
  }
end

# file2.rb
require 'file3'

# file3.rb
some_global_mutex.synchronize {
  require 'file1'
}

now, this example is pure bad coding but perhaps in a more complex sequence of requires, using third-party code, the unadvised developer could run into cases like this.

It's a tough problem to solve properly and elegantly and still support the same functionalities.

@headius
Copy link
Member

headius commented Oct 22, 2015

@bignacio Assuming those were not mutexes but instead monitors (reentrant locking) there's no deadlock here. The second synchronize in "file3" would have been acquired already.

It also would not deadlock with multiple threads, since only the first thread to acquire the lock would continue to finish its loading.

The only place where a single require lock would cause the system to deadlock is when a require or load never returns. For example, a server that you start by requiring a file.

I don't know how common this is in Ruby-land, but I'd hope it's pretty rare.

@headius
Copy link
Member

headius commented Aug 19, 2016

I'm sorry, we need to revert this change and reexamine the problem. This change causes require to allow all contending threads to proceed without waiting for a require to complete (#4091).

And my initial re-examination of this bug points out a big problem to me: if two threads require two files in different orders, and those files depend on each other, there's no way to avoid a deadlock under the current model of Ruby require locking. So this bug may be a correct behavior of Ruby as it stands today.

@headius headius reopened this Aug 19, 2016
@headius
Copy link
Member

headius commented Aug 19, 2016

Specifically, in this example it is not possible to be thread-safe: #3341 (comment)

a requires b. That requires acquiring the lock for a and then the lock for b.
b requires a. That requires acquiring the lock for b and then the lock for a.

So we have inconsistent lock ordering. This is the simplest example of a deadlock.

The only fix for this is to have a single lock for requiring all files, and only one thread gets to do requires at a time. JRuby used to have such a mode; I will check that it still works, so it can be a workaround for systems with badly-structured require trees.

@headius
Copy link
Member

headius commented Aug 20, 2016

Until convinced otherwise, I'm calling this one Won't Fix. It is not possible to avoid deadlocks in Ruby requires with their current specification. Don't require interdependent files in different orders from different threads.

I will look into deadlock detection, to see if we can find a way to error out rather than blocking.

@headius headius closed this as completed Aug 20, 2016
@headius headius modified the milestones: Won't Fix, JRuby 9.1.0.0 Aug 20, 2016
@headius
Copy link
Member

headius commented Aug 20, 2016

I have written a naive deadlock detector that will error out if it sees two threads with conflicting require locks. I do not believe it will work for anything more than two threads right now.

https://gist.github.com/headius/6a86bae2556d03b5fa235d6df695278c

The logic here basically spins attempting to acquire the lock. Initially, and for each spin that the lock fails to be acquired, it checks for deadlocks. Deadlock checking involves looking at the thread that owns the require lock and determining if it is waiting on a lock itself. If so, and that lock appears in the list of locks the current thread holds, we have a two-way deadlock.

There are algorithms for doing N-way deadlock detection but they get very complicated. I am not sure if we want that in the require path...but then again, it will only be needed when two threads contend a single require lock. It may be unlikely to affect anyone if the deadlock detection is heavier.

headius added a commit that referenced this issue Aug 20, 2016
This detector works only with two threads, using the following
heuristic:

1. If the lock is already acquired, obtain its owner thread.
2. Check if the owner thread is waiting on a lock.
3. If the lock the owner is waiting on is held by the current
   thread, raise an error.
4. If there's no owner and no deadlock, try to lock the thread for
   500ms + rand(100)ms.
5. If the file has been locked, return success.
6. Otherwise, start over at 1.

We may want a more advanced detector that can handle arbitrarily
many threads, but it would be much heavier and require scanning
all threads in the system, including those unrelated to the
current program and those not actually trying to require files.

See #3341.
headius added a commit to headius/jruby that referenced this issue Sep 30, 2016
This detector works only with two threads, using the following
heuristic:

1. If the lock is already acquired, obtain its owner thread.
2. Check if the owner thread is waiting on a lock.
3. If the lock the owner is waiting on is held by the current
   thread, raise an error.
4. If there's no owner and no deadlock, try to lock the thread for
   500ms + rand(100)ms.
5. If the file has been locked, return success.
6. Otherwise, start over at 1.

We may want a more advanced detector that can handle arbitrarily
many threads, but it would be much heavier and require scanning
all threads in the system, including those unrelated to the
current program and those not actually trying to require files.

See jruby#3341.
headius added a commit that referenced this issue Nov 8, 2016
This detector works only with two threads, using the following
heuristic:

1. If the lock is already acquired, obtain its owner thread.
2. Check if the owner thread is waiting on a lock.
3. If the lock the owner is waiting on is held by the current
   thread, raise an error.
4. If there's no owner and no deadlock, try to lock the thread for
   500ms + rand(100)ms.
5. If the file has been locked, return success.
6. Otherwise, start over at 1.

We may want a more advanced detector that can handle arbitrarily
many threads, but it would be much heavier and require scanning
all threads in the system, including those unrelated to the
current program and those not actually trying to require files.

See #3341.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants