Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FFI/LibCurl buffer overflow under heavy load. #752

Closed
jgwmaxwell opened this issue May 21, 2013 · 11 comments
Closed

FFI/LibCurl buffer overflow under heavy load. #752

jgwmaxwell opened this issue May 21, 2013 · 11 comments

Comments

@jgwmaxwell
Copy link

I mentioned this as part of #231 but reopening as a new issue.

Situation:

Server:

TorqueBox 3.x.incremental.1606

JRuby:

1.7.3

Java (Oracle):

java version "1.7.0_21"
Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)

Affected Gems:

Typhoeus 0.6.3
Ethon 0.5.12
Faraday 0.8.7

Stack Trace:

https://gist.github.com/jgwmaxwell/5621074

Outline:

Running Typhoeus using the #in_parallel request grouping method in a TorqueBox 'Processor' works fine, until the load gets pretty high, then it does this. The problems come when I've got 10 threads, dispatching 3-5 requests in parallel each, running approximately 1.5k 'jobs' per minute between them, or 4.5k-7.5k req/min.

The server is fine on all RAM meters, isn't swapping, and CPU is sitting around 85% (this is also serving web requests, servicing STOMP clients and doing a few other things at the same time).

It's not a production critical issue for us, as we can simply swap Typhoeus out, but thought it might be interesting to raise!

@ghost
Copy link

ghost commented Jun 14, 2013

Is it possible to write a test case that reproduces this? Since it is on the curl side, it would probably just be something that spawns off a lot of requests in separate threads, I suspect, but extracting the way your code sets it up would be helpful.

@jgwmaxwell
Copy link
Author

Sure thing, won't get a chance until the weekend but will look into that and post back here.

@ghost
Copy link

ghost commented Jun 14, 2013

No hurry ... I'm hoping to look at it sometime next week. I'm assuming its a bug in the way FFI reclaims memory allocations.

@ghost ghost self-assigned this Jun 17, 2013
@leifg
Copy link

leifg commented Apr 9, 2014

Is there anything new wit this? I get this error after ~ 7000 requests (200 in parallel).

Works perfectly fine with MRI

My env:

$ java -version
java version "1.7.0_25"
OpenJDK Runtime Environment (IcedTea 2.3.10) (7u25-2.3.10-1ubuntu0.12.04.2)
OpenJDK 64-Bit Server VM (build 23.7-b01, mixed mode)
$ ruby -v
jruby 1.7.11 (2.0.0p195) 2014-02-24 86339bb on OpenJDK 64-Bit Server VM 1.7.0_25-b30 [linux-amd64]

@headius
Copy link
Member

headius commented Apr 9, 2014

@leifg We never got a reproduction from OP so this did not move forward. The backtrace also indicates it's blowing up well inside libcurl, so I am wondering if perhaps libcurl isn't thread-safe for all uses.

If it's in JRuby/FFI, I'd suspect some data structure improperly shared across threads, like a pre-allocated struct of some kind. It's also possible that the autopointer code has some threading issue that's causing a pointer to be shared or obliterated too soon.

@headius
Copy link
Member

headius commented Apr 9, 2014

Interestingly, here's a report with a nearly identical error, but in this case it's on MRI using curb: sidekiq/sidekiq#1400

Starting to think this is a libcurl issue or bad libcurl usage by some library.

@headius
Copy link
Member

headius commented Apr 9, 2014

Ok, I'm having trouble figuring out what library actually uses Ethon. Current theory is that user is sharing a "multi" instance across threads. The libcurl documentation explicitly warns against sharing handles across threads, probably because internal buffers might be overflowed (as the case here).

Really could use some kind of reproduction, or at least a Gemfile and help investigating who might be the bad libcurl consumer.

@hanshasselberg
Copy link

@headius thanks for looking into this. There is a maybe related issue here: typhoeus/ethon#79 but again without code. :(

@jgwmaxwell
Copy link
Author

Hi, OP here - sadly I don't have the code that was triggering this any more, although I can say that there was something funny going on with the combination of usage and Torquebox's message processors, as the code itself never presented issues when run directly in JRuby.

I'm guessing that you've got the cause there @headius - it must have been something to do with incorrect thread handling and inadvertently sharing an instance of the typhoeus hydra. Sorry I can't contribute actual code.

@leifg
Copy link

leifg commented Apr 15, 2014

OK I created a repo to reproduce it.

It's here

However I'm using this version of curl on a Ubuntu system which is not the newest version.

curl 7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3
Protocols: dict file ftp ftps gopher http https imap imaps ldap pop3 pop3s rtmp rtsp smtp smtps telnet tftp
Features: GSS-Negotiate IDN IPv6 Largefile NTLM NTLM_WB SSL libz TLS-SRP

@enebo enebo added this to the Invalid or Duplicate milestone Feb 17, 2017
@enebo
Copy link
Member

enebo commented Feb 17, 2017

Due to age and the fact I ran repo from last reporter without seeing a crash I will assume this is working or just so old it is not triggering (although I am using JRuby 9k instead of 1.7.x since we are EOL'ing 1.7.x soon). If someone has more knowledge on this issue then please open a new issue on it.

@enebo enebo closed this as completed Feb 17, 2017
kyrylo added a commit to airbrake/airbrake that referenced this issue Sep 21, 2021
This spec segfaults on latest JRuby 9.2.19.0 in JVM.
https://app.circleci.com/pipelines/github/airbrake/airbrake/227/workflows/3db05cae-87d8-4599-9da2-a37c4c76bf4f/jobs/9670

During an extensive debugging session I haven't been able to identify the root
cause.

I found an identical report from 2012 but it had no resolution:
typhoeus/typhoeus#202

I also found related links:
* jruby/jruby#231
* jruby/jruby#752
kyrylo added a commit to airbrake/airbrake that referenced this issue Sep 21, 2021
This spec segfaults on latest JRuby 9.2.19.0 in JVM.
https://app.circleci.com/pipelines/github/airbrake/airbrake/227/workflows/3db05cae-87d8-4599-9da2-a37c4c76bf4f/jobs/9670

During an extensive debugging session I haven't been able to identify the root
cause.

I found an identical report from 2012 but it had no resolution:
typhoeus/typhoeus#202

I also found related links:
* jruby/jruby#231
* jruby/jruby#752
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants