-
-
Notifications
You must be signed in to change notification settings - Fork 924
Simultaneous client reads / writes to the same socket from different threads can deadlock #4854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I suspect this is bad lock management by the socket subsystem. Most operations in IO have been audited to lock and unlock all appropriate locks, in the same order all the time. Socket has not had such a touch recently. Diagnosis: Straight-up deadlock in the OpenFile read/write lock, probably due to improper locking in socket library. |
Workaround for you for now would be to lock around these IO objects. It may be enough to |
There are two changes here, with the first the more likely cause of #4854: 1. While leaving the OpenFile lock locked, we proceeded to attempt to lock the Selector being selected. If the order of these locks happens differently elsewhere, it will cause a deadlock. Fixed by unlocking the OpenFile lock before locking the Selector lock. 2. OpenFile.removeBlockingThread synchronized on both OpenFile's lock and the OpenFile instance itself. I could find only a handful of other places that lock OpenFile and they do not appear to lock anything else, but there's no reason to lock OpenFile here anyway.
Easy enough problem to find: we were locking two locks that might get locked in a different order elsewhere. I made sure to unlock one them first, so it should at least avoid the deadlock you found. |
I am not sure how best to create a test for this, given the complexity of the original script and the difficulty of testing for deadlocks in CI. If you have ideas, please feel free to submit a PR against spec/ruby or test/jruby. |
Environment
Provide at least:
JRuby version (
jruby -v
) and command line (flags, JRUBY_OPTS, etc)9.1.13.0, 9.1.14.0, maybe others?
Operating system and platform (e.g.
uname -a
)Windows 10, x86_64
Expected Behavior
When a client interleaves blocking calls to read data and write data to the same TCP socket from different threads, the client is able to eventually complete the read and write calls with no errors. This is what I've seen with the script below with MRI Ruby 2.4.2, running the script indefinitely. On JRuby 9k, however, the process deadlocks after a few seconds, never seeming to recover.
Here's the script I've used to reproduce this problem. I just ran
jruby.exe .\socket_sample.rb
to launch the script. The script opens a server socket and accepts a single connection from a client. The script performs interleaved read/write calls on the client side of the socket (with no synchronization). On the server side, reads and writes are performed as well, but all from a single thread.Actual Behavior
Running with MRI Ruby 2.4.2, the script appears to run fine indefinitely, with messages like the following in the output:
Running with JRuby 9.1.14.0, however, the client read/write threads appear to deadlock one another after only a few seconds of running. The console output looks like the following:
Running jstack against the Java process for a JRuby 9.1.14.0 run after the 'waiting for data to read...' message above, I see the following:
The reader thread has taken a lock on the channel within the
RubyThread.select
method but is waiting to take a lock (which the writer holds) within theOpenFile.lock()
method. The writer thread has taken the lock held by theOpenFile
object but is waiting to take the lock on the channel (which the reader holds) within theRubyThread.select
method.I've seen this occur not just in cases where the writing thread encounters a
waitWritable
condition but also in cases where the writer thread is blocked on a call to the synchronizedOpenFile.unread
method, which cannot proceed because the reader thread is in the middle of executing the synchronizedOpenFile.removeBlockingThread
method. In this case, the reader thread is unable to obtain the reentrant lock withinOpenFile
, though, because the writer thread already holds it - leading to another form of deadlock. I haven't yet been able to get a more narrow code sample to reproduce this second issue as frequently, though.If a mutex is synchronized around the client read and write calls, the deadlocks no longer occur (even for JRuby):
This seems like a reasonable workaround to the problem, although maybe with some performance implications. It may be that MRI Ruby is more immune to this kind of problem because of the GIL, not sure?
The text was updated successfully, but these errors were encountered: