-
-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JRuby thread stuck on IntHashMap.rehash #3394
Comments
That's very strange. Nothing should block in that map, so it sounds like something has put it into an unstable state that loops forever. I'm not surprised you can't reproduce it. Can you build JRuby? Perhaps you can stop by IRC tomorrow during the day US time and we'll help you look at it? It might be easiest to attach a debugger or console to see what that thread is actually doing. |
@kares I was just looking quickly, and I don't see any synchronization around our overload resolution logic in JI. Can you confirm? |
@headius no sync indeed ... adding the |
simplest thing would be |
Without an atomic solution like ConcurrentHashMap, we will need to synchronize both puts and gets, since gets could see an unstable state in the data structure if a put is active. |
This needs to be resolved before 9.0.2.0. |
The move to IntHashMap in RubyToJavaInvoker improved throughput for multiple-arity calls, but since IntHashMap is not synchronized this could cause corrupted data under concurrent load as seen in CallableSelector to encapsulate the cache and synchronize access to it. I believe this is a minimum effort to make the cache access safe and fix #3394. I have not measured the impact of this synchronization in a high concurrency setting. It may indeed turn out to be more overhead and a worse bottleneck than using ConcurrentHashMap with Integer objects. Unfortunately there's no middle ground here unless we build or adopt some concurrency-friendly int-based map. Another heavy option would be to maintain two references to IntHashMap in RubyToJavaInvoker, copying from a synchronized writable copy to a read-only copy after every write. This would double the size of the cache structures in memory but would provide a fast, unsynchronized cache for read-mostly signature lookups. This pattern is also easier to support with the refactoring in this commit.
@banker The fixed IntHashMap usage has been pushed to master and will be in 9.0.2. |
Thanks for looking into this, @headius. I'll build from master and re-run in my environment. Since the bug has taken up to 12 hours to appear, it will take some time for me to verify that the issue is resolved. I'll report back by Friday. |
Bad news, @headius. I ran one instance of my app overnight using the new code, and it's stuck again, but differently. All 8 of my threads are
Let me know if I can provide any further information, and thanks again for such a quick response for far! |
@banker I think this is the older code that had a lock involved, because the newest version does not have a lock (object monitor). Can you show me the |
Ahh actually I'm sorry...the new code does have a lock for writes, as in putSignature. But I'm very confused now why it would be hanging, since it's only a single lock, no other locks are involved, and reads should have no locking of their own. |
Right. The only thing I can add is that, anecdotally, this new code is much more likely to trigger this deadlock. |
@banker Harumph. Very confusing. It sounds like there may simply be a bug in our IntHashMap at this point. |
@banker Can you gist a full stack dump of your VM in this bad state? I would expect to see another thread inside IntHashMap that has acquired the lock. My theory at the moment is that IntHashMap just has a bug that causes it to endlessly loop, and the object monitor trace is just a side effect. |
Yeah, I also don't see how all threads could be stuck waiting for that lock. |
@headius Okay, it turns out that one thread was runnable, which I've included below. Can I email you the gist? Not completely comfortable making it public.
|
@banker Yeah go ahead, headius@headius.com. I'm also pushing a change to master to use a high-throughput, concurrency-friendly map implementation rather than using our own. It should be in nightly builds tomorrow morning. |
After a brief search (and some lazy Twittering) I was pointed at https://github.com/boundary/high-scale-lib, a maintained, maven-released fork of Cliff Click's high-throughput map implementations. They are fast, memory-efficient, concurrency-friendly, and non-blocking. The best part is that there's an implementation that takes a long key, so we can basically drop it in wherever we use IntHashMap. Here's perf results. Before (COW IntHashMap):
After (NonBlockingHashMapLong
Performance is essentially the same. I'm going to push the change, since the library only adds about 100kB to JRuby. |
Reopening one last time. If the new collection works, then it would indicate there's a bug in IntHashMap. In that case, we should also reimpl IntHashMap in terms of NonBlockingHashMapLong in both JRuby 1.7.x and 9k. |
@headius just a note (in case you are unaware of it and end up using Cliff's hash map elsewhere in JRuby), it leaks deleted keys (only keys not values). Obviously, this might not matter depending on use case (no deletions) or, as is the case with |
@thedarkone That's good information, thank you. I think we'll probably still be using CHM for our Object:Object maps but this will be a good replacement for IntHashMap, which obviously seems to have some issues. |
@banker Hopefully this has fixed your issues. This change will go out in the release today. I'm going to optimistically close this. |
@headius I'll build again, run, and get back to you. I hope it's fixed, too! Currently just restarting the app on a cron to preemptively avoid :-\ |
@banker or anyone ... could you confirm its behaving properly - considering back-porting to jruby-1_7 if so |
This should hopefully fix jruby#3394. Conflicts: core/pom.rb core/src/main/java/org/jruby/java/invokers/RubyToJavaInvoker.java
…avaInvoker-s ... introducing an internal CallableCache interface to loose coupling in CallableSelector aligned with main fix for jruby#3394
…avaInvoker-s ... introducing an internal CallableCache interface to loose coupling in CallableSelector aligned with main fix for #3394
This should hopefully fix jruby#3394. Conflicts: core/pom.rb core/src/main/java/org/jruby/java/invokers/RubyToJavaInvoker.java
…avaInvoker-s ... introducing an internal CallableCache interface to loose coupling in CallableSelector aligned with main fix for jruby#3394 this commit also includes pieces of da1b849 and so Conflicts: core/src/main/java/org/jruby/java/dispatch/CallableSelector.java core/src/main/java/org/jruby/java/invokers/RubyToJavaInvoker.java core/src/main/java/org/jruby/javasupport/Java.java
Also released in 1.7.23. |
I have a multi-threaded application running on
JRuby 9.0.1.0 (Java 1.8.0_45-b14)
. The application runs with 8 Ruby threads. One of those threads has been stuck on theIntHashMap.rehash
method for the last 13 hours. The remaining threads are all functioning normally. Here is (most of) the stack trace for the thread in question as reported byjstack
:I cannot yet reliably reproduce this, but it does reliably occur once the app has been running for a few hours.
The text was updated successfully, but these errors were encountered: