-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Errno::EAGAIN thrown by SSLSocket#connect_nonblock #93
Comments
Thanks Mohamed, unfortunately I'm not sure someone would run into trying to fix this without reproducing. |
Now that I think about it, how would |
Jeez ignore that last one, I obviously know nothing about nonblocking io coding:/ After reading a bunch and looking at the code, I'm guessing that if a socket would block, it should always either be wait readable or wait writable, and there shouldn't ever be a generic "this socket would block on something but I can't tell you if its reading or writing", correct? I'm trying to go through the jruby code myself and see if I can't spot where those |
I think I might have found what is causing this: If the Do we really need to execute the code for when blocking is false inside the |
I ended up implementing what I suggested in the previous comment and am currently running this code in production. However, I'm still seeing the EAGAIN errors, because waitSelect is still returning false here... I must be completely misunderstanding what |
So I'm guessing the selector that is retrieved with getSelectorPool might also be used by other threads, watching other channels? So that when If that sounds reasonable to someone who knows this code, I'd be willing to try it out and report back. (A little hesitant to try it out in production until then since I'm taking a shot in the dark here) |
Looking through jruby code, it looks like |
very najs session! |
Ok so I've been running code with the extra logging, and been getting some really weird results from this line. Many times there are indeed two keys in the set returned by So I think its safe to say that we are indeed getting selectors that already have keys in them... haven't been able to see where that would happen despite looking for a couple hours... @headius it looks like you were the original author of the SelectorPool class, any thoughts on what might cause this behavior? One really weird thing I've seen is that I'm only getting these errors in my frontend Puma processes. My backend daemon, which makes a hell of a lot more http/s requests, hasn't shown any unexpected results at all so far (i.e. no output at all from here). Maybe there is some code in Puma or something it relies on that returns selectors to the pool without cancelling its keys? (I can't find anything in Puma itself though) In any case, it would probably be best to have Oh, I haven't seen any stacktraces from key cleanup, so I don't think its that (it doesn't look like |
I started running this patch in production to clean up keys in |
Looks like that did the trick! Looking through jruby code some more, I think this line might be the culprit: its the only place that checks if a key is valid before cancelling it, but the key can be invalid because the channel was closed, and perhaps in that case if it doesn't get cancelled it still can be reported by Anyway, I think the correct fix is to ensure the key-cleaning takes place, correctly, in @kares unless you object I think I'll also send in a pull request for a cleaned-up version of my patch to move waitSelect's nonblocking code out of the RubyThread.BlockingTask: I don't think it has anything to do with this bug, but still it would probably save a bit of overhead on a common task I'd imagine. |
A few times a day I'm getting
Errno::EAGAIN
thrown here in Net::HTTP#connect, I'm not sure if this is a fault of the ruby code not handlingErrno::EAGAIN
, or if MRI ruby'sSSLSocket#connect_nonblock
handlesErrno::EAGAIN
internally.I'm running a patch that catches the
Errno::EAGAIN
, sleeps 0.01, and then retries, and it works just fine, if the case is that the ruby code should really be handling this then I can submit a fix to ruby and we can close this issue. If that's not the case though and this should be handled internally by jruby's SSLSocket implementation to match MRI's behavior, then the stack trace points to this error being thrown at:(I have no idea how to reproduce
Errno::EAGAIN
to see if MRI handles it or not, sorry!)The text was updated successfully, but these errors were encountered: