Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NullPointerException in RubyArray.join #2800

Closed
e2 opened this issue Apr 3, 2015 · 11 comments
Closed

NullPointerException in RubyArray.join #2800

e2 opened this issue Apr 3, 2015 · 11 comments

Comments

@e2
Copy link

e2 commented Apr 3, 2015

Just reporting this since it occurred.

@8d7c1b1

Exception in thread "Ruby-0-Thread-6: /home/me/foo.rb:274" java.lang.NullPointerException
at org.jruby.RubyArray.join19(RubyArray.java:1775)
at org.jruby.RubyArray$INVOKER$i$join19.call(RubyArray$INVOKER$i$join19.gen)
at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:161)
at org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:288)
at org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:74)
at org.jruby.ir.interpreter.Interpreter.INTERPRET_BLOCK(Interpreter.java:137)
at org.jruby.runtime.InterpretedIRBlockBody.commonYieldPath(InterpretedIRBlockBody.java:109)
at org.jruby.runtime.IRBlockBody.call(IRBlockBody.java:75)
at org.jruby.runtime.Block.call(Block.java:106)
at org.jruby.RubyProc.call(RubyProc.java:305)
at org.jruby.RubyProc.call(RubyProc.java:232)
at org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:99)
at java.lang.Thread.run(Thread.java:745)

I'm not sure what I'm doing yet - just accessing an array inside a mutex lock, inside a thread (which has the same mutex lock). I'll try and reproduce this - let me know if that's necessary (can't afford a lot of time on it).

jruby 9.0.0.0-SNAPSHOT (2.2.1) 2015-03-30 @8d7c1b1 OpenJDK 64-Bit Server VM 24.75-b04 on 1.7.0_75-b13 +jit [linux-amd64]

@e2 e2 changed the title NullPointerException in NullPointerException in RubyArray.join Apr 3, 2015
@e2
Copy link
Author

e2 commented Apr 3, 2015

This isn't exactly master's HEAD, but I didn't see relevant changes since this commit.

@e2
Copy link
Author

e2 commented Apr 3, 2015

I'll update to master and see if I can reproduce it.

@headius
Copy link
Member

headius commented Apr 3, 2015

It does look like a concurrency problem. The line in question blows up because a value in the array is null; this should not happen under normal circumstances since arrays are usually filled with nils at allocation time to avoid null checking.

Do make certain that all accesses of this array are done under lock. Both reads and writes generally need to be under lock because writes can break other writes or other reads.

@e2
Copy link
Author

e2 commented Apr 3, 2015

I updated to: jruby 9.0.0.0-SNAPSHOT (2.2.1) 2015-04-03 82a6227 OpenJDK 64-Bit Server VM 24.75-b04 on 1.7.0_75-b13 +jit [linux-amd64]

The above still occurred occasionally.

I stumbled onto this as well:

Exception in thread "Ruby-0-Thread-7: /home/me/foo.rb:274" java.lang.ArrayIndexOutOfBoundsException: 7
        at org.jruby.RubyArray.collect(RubyArray.java:2294)
        at org.jruby.RubyArray.map19(RubyArray.java:2307)
        at org.jruby.RubyArray$INVOKER$i$0$0$map19.call(RubyArray$INVOKER$i$0$0$map19.gen)
        at org.jruby.internal.runtime.methods.JavaMethod$JavaMethodZeroBlock.call(JavaMethod.java:472)
        at org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:77)
        at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:83)
        at org.jruby.ir.instructions.CallBase.interpret(CallBase.java:410)
        at org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:320)
        at org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:74)
        at org.jruby.ir.interpreter.Interpreter.INTERPRET_BLOCK(Interpreter.java:137)
        at org.jruby.runtime.InterpretedIRBlockBody.commonYieldPath(InterpretedIRBlockBody.java:109)
        at org.jruby.runtime.IRBlockBody.call(IRBlockBody.java:75)
        at org.jruby.runtime.Block.call(Block.java:106)
        at org.jruby.RubyProc.call(RubyProc.java:305)
        at org.jruby.RubyProc.call(RubyProc.java:232)
        at org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:99)
        at java.lang.Thread.run(Thread.java:745)

This happened with about 10 threads running (and waiting Queue#pop), and the eleventh one being created.

@e2
Copy link
Author

e2 commented Apr 3, 2015

Do make certain that all accesses of this array are done under lock. Both reads and writes generally need to be under lock because writes can break other writes or other reads.

Technically, since I'm fixing a crappy implementation, I don't mind borked values in an array, or Ruby exceptions - as long as I'm "protected" from exceptions in Java.

Should I keep reporting things like this here? (Don't want to spam, but don't want to leave potential problems unreported).

@e2
Copy link
Author

e2 commented Apr 3, 2015

By array, does it mean thread locals here?

Here's a piece of the code:

      queue = Queue.new
      thread = Thread.new do  # line reported (274)
        Thread.current[:foo] = :foo
        queue.pop
        Thread.current[:foo] = :bar      

@e2
Copy link
Author

e2 commented Apr 3, 2015

Just updated the above snippet - sorry

@headius
Copy link
Member

headius commented Apr 3, 2015

The line reported in the exception message is just telling you where that thread was created. That's the thread that eventually is seeing a null result.

Are you having trouble finding where in your code it is breaking? Judging by the stack trace, it's that thread from line 274 calling Array#map or collect this time. I can't see the rest of the code, but it's a call happening directly in the thread's block, so you should be able to figure out what array is triggering this.

You may get more info in the JVM trace if you pass -Xjit.threshold=0 to JRuby. This will attempt to compile all methods immediately, so they show up at the JVM level.

And yes, you are quite right to report this. We treat JVM exceptions like crashes. This case is probably triggered by concurrent Array modification, but we'll help you find the problem anyway.

@e2
Copy link
Author

e2 commented Apr 3, 2015

Are you having trouble finding where in your code it is breaking?

I'm actually debugging something else (on a higher level) - more related to logic and race conditions, not crashes.

There's too much stuff happening in the code, so it would take me lots of time to "shave out" an example - and I still have no guarantee I'd be able to reproduce the problem (due to other unrelated race conditions). Which is why I don't think it's worth the time - unless you think otherwise.

I'll let you know if I find anything specific to this issue.

That's the thread that eventually is seeing a null result.

If it helps, the code for the failing class I'm using goes something like this (just to get the gist):

class Foo
  def initialize
    @mutex = Mutex.new
    @abc = []
  end

  def foo(thread)
    @mutex.synchronize do
      @abc.delete(thread)
      @abc.map { |a| a.object_id }
      @abc.push(thread)
      @abc.size
    end
  end

  def new_thread
    queue = Queue.new
    thread = Thread.new do  # line reported (274)
      queue.pop
      foo(thread) # with Xjit.threshold I presume the line number will be here
    end
    queue << nil
  end
end

@headius
Copy link
Member

headius commented Apr 3, 2015

In order for there to be a concurrency issue, some other thread would need to be accessing the @abc array without going through the mutex. You should check that every piece of code that might touch that array (for read or write) has appropriate locking.

The jit.threshold flag will basically turn interpreted frames in the Java stack trace into .rb frames, so we can see the whole stack with Ruby included.

I'm in IRC (Freenode, #jruby) if you want to work with me a bit more interactively.

@e2
Copy link
Author

e2 commented Apr 3, 2015

Closing, since the workaround is to use locking. And Xjit.threshold=0 helps locate the issue otherwise.

@e2 e2 closed this as completed Apr 3, 2015
@headius headius added this to the Invalid or Duplicate milestone Apr 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants