-
-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FFI/LibCurl buffer overflow under heavy load. #752
Comments
Is it possible to write a test case that reproduces this? Since it is on the curl side, it would probably just be something that spawns off a lot of requests in separate threads, I suspect, but extracting the way your code sets it up would be helpful. |
Sure thing, won't get a chance until the weekend but will look into that and post back here. |
No hurry ... I'm hoping to look at it sometime next week. I'm assuming its a bug in the way FFI reclaims memory allocations. |
Is there anything new wit this? I get this error after ~ 7000 requests (200 in parallel). Works perfectly fine with MRI My env:
|
@leifg We never got a reproduction from OP so this did not move forward. The backtrace also indicates it's blowing up well inside libcurl, so I am wondering if perhaps libcurl isn't thread-safe for all uses. If it's in JRuby/FFI, I'd suspect some data structure improperly shared across threads, like a pre-allocated struct of some kind. It's also possible that the autopointer code has some threading issue that's causing a pointer to be shared or obliterated too soon. |
Interestingly, here's a report with a nearly identical error, but in this case it's on MRI using curb: sidekiq/sidekiq#1400 Starting to think this is a libcurl issue or bad libcurl usage by some library. |
Ok, I'm having trouble figuring out what library actually uses Ethon. Current theory is that user is sharing a "multi" instance across threads. The libcurl documentation explicitly warns against sharing handles across threads, probably because internal buffers might be overflowed (as the case here). Really could use some kind of reproduction, or at least a Gemfile and help investigating who might be the bad libcurl consumer. |
@headius thanks for looking into this. There is a maybe related issue here: typhoeus/ethon#79 but again without code. :( |
Hi, OP here - sadly I don't have the code that was triggering this any more, although I can say that there was something funny going on with the combination of usage and Torquebox's message processors, as the code itself never presented issues when run directly in JRuby. I'm guessing that you've got the cause there @headius - it must have been something to do with incorrect thread handling and inadvertently sharing an instance of the typhoeus hydra. Sorry I can't contribute actual code. |
OK I created a repo to reproduce it. It's here However I'm using this version of curl on a Ubuntu system which is not the newest version.
|
Due to age and the fact I ran repo from last reporter without seeing a crash I will assume this is working or just so old it is not triggering (although I am using JRuby 9k instead of 1.7.x since we are EOL'ing 1.7.x soon). If someone has more knowledge on this issue then please open a new issue on it. |
This spec segfaults on latest JRuby 9.2.19.0 in JVM. https://app.circleci.com/pipelines/github/airbrake/airbrake/227/workflows/3db05cae-87d8-4599-9da2-a37c4c76bf4f/jobs/9670 During an extensive debugging session I haven't been able to identify the root cause. I found an identical report from 2012 but it had no resolution: typhoeus/typhoeus#202 I also found related links: * jruby/jruby#231 * jruby/jruby#752
This spec segfaults on latest JRuby 9.2.19.0 in JVM. https://app.circleci.com/pipelines/github/airbrake/airbrake/227/workflows/3db05cae-87d8-4599-9da2-a37c4c76bf4f/jobs/9670 During an extensive debugging session I haven't been able to identify the root cause. I found an identical report from 2012 but it had no resolution: typhoeus/typhoeus#202 I also found related links: * jruby/jruby#231 * jruby/jruby#752
I mentioned this as part of #231 but reopening as a new issue.
Situation:
Server:
JRuby:
Java (Oracle):
Affected Gems:
Stack Trace:
https://gist.github.com/jgwmaxwell/5621074
Outline:
Running Typhoeus using the #in_parallel request grouping method in a TorqueBox 'Processor' works fine, until the load gets pretty high, then it does this. The problems come when I've got 10 threads, dispatching 3-5 requests in parallel each, running approximately 1.5k 'jobs' per minute between them, or 4.5k-7.5k req/min.
The server is fine on all RAM meters, isn't swapping, and CPU is sitting around 85% (this is also serving web requests, servicing STOMP clients and doing a few other things at the same time).
It's not a production critical issue for us, as we can simply swap Typhoeus out, but thought it might be interesting to raise!
The text was updated successfully, but these errors were encountered: