FFI/LibCurl buffer overflow under heavy load. #752

jgwmaxwell · 2013-05-21T17:51:45Z

I mentioned this as part of #231 but reopening as a new issue.

Situation:

Server:

TorqueBox 3.x.incremental.1606

JRuby:

1.7.3

Java (Oracle):

java version "1.7.0_21"
Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)

Affected Gems:

Typhoeus 0.6.3
Ethon 0.5.12
Faraday 0.8.7

Stack Trace:

https://gist.github.com/jgwmaxwell/5621074

Outline:

Running Typhoeus using the #in_parallel request grouping method in a TorqueBox 'Processor' works fine, until the load gets pretty high, then it does this. The problems come when I've got 10 threads, dispatching 3-5 requests in parallel each, running approximately 1.5k 'jobs' per minute between them, or 4.5k-7.5k req/min.

The server is fine on all RAM meters, isn't swapping, and CPU is sitting around 85% (this is also serving web requests, servicing STOMP clients and doing a few other things at the same time).

It's not a production critical issue for us, as we can simply swap Typhoeus out, but thought it might be interesting to raise!

The text was updated successfully, but these errors were encountered:

ghost · 2013-06-14T05:00:46Z

Is it possible to write a test case that reproduces this? Since it is on the curl side, it would probably just be something that spawns off a lot of requests in separate threads, I suspect, but extracting the way your code sets it up would be helpful.

jgwmaxwell · 2013-06-14T05:02:03Z

Sure thing, won't get a chance until the weekend but will look into that and post back here.

ghost · 2013-06-14T05:25:00Z

No hurry ... I'm hoping to look at it sometime next week. I'm assuming its a bug in the way FFI reclaims memory allocations.

leifg · 2014-04-09T16:42:57Z

Is there anything new wit this? I get this error after ~ 7000 requests (200 in parallel).

Works perfectly fine with MRI

My env:

$ java -version
java version "1.7.0_25"
OpenJDK Runtime Environment (IcedTea 2.3.10) (7u25-2.3.10-1ubuntu0.12.04.2)
OpenJDK 64-Bit Server VM (build 23.7-b01, mixed mode)

$ ruby -v
jruby 1.7.11 (2.0.0p195) 2014-02-24 86339bb on OpenJDK 64-Bit Server VM 1.7.0_25-b30 [linux-amd64]

headius · 2014-04-09T17:26:13Z

@leifg We never got a reproduction from OP so this did not move forward. The backtrace also indicates it's blowing up well inside libcurl, so I am wondering if perhaps libcurl isn't thread-safe for all uses.

If it's in JRuby/FFI, I'd suspect some data structure improperly shared across threads, like a pre-allocated struct of some kind. It's also possible that the autopointer code has some threading issue that's causing a pointer to be shared or obliterated too soon.

headius · 2014-04-09T17:50:14Z

Interestingly, here's a report with a nearly identical error, but in this case it's on MRI using curb: sidekiq/sidekiq#1400

Starting to think this is a libcurl issue or bad libcurl usage by some library.

headius · 2014-04-09T18:00:15Z

Ok, I'm having trouble figuring out what library actually uses Ethon. Current theory is that user is sharing a "multi" instance across threads. The libcurl documentation explicitly warns against sharing handles across threads, probably because internal buffers might be overflowed (as the case here).

Really could use some kind of reproduction, or at least a Gemfile and help investigating who might be the bad libcurl consumer.

hanshasselberg · 2014-04-09T18:31:25Z

@headius thanks for looking into this. There is a maybe related issue here: typhoeus/ethon#79 but again without code. :(

jgwmaxwell · 2014-04-10T08:43:00Z

Hi, OP here - sadly I don't have the code that was triggering this any more, although I can say that there was something funny going on with the combination of usage and Torquebox's message processors, as the code itself never presented issues when run directly in JRuby.

I'm guessing that you've got the cause there @headius - it must have been something to do with incorrect thread handling and inadvertently sharing an instance of the typhoeus hydra. Sorry I can't contribute actual code.

leifg · 2014-04-15T15:05:01Z

OK I created a repo to reproduce it.

It's here

However I'm using this version of curl on a Ubuntu system which is not the newest version.

curl 7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3
Protocols: dict file ftp ftps gopher http https imap imaps ldap pop3 pop3s rtmp rtsp smtp smtps telnet tftp
Features: GSS-Negotiate IDN IPv6 Largefile NTLM NTLM_WB SSL libz TLS-SRP

enebo · 2017-02-17T16:28:01Z

Due to age and the fact I ran repo from last reporter without seeing a crash I will assume this is working or just so old it is not triggering (although I am using JRuby 9k instead of 1.7.x since we are EOL'ing 1.7.x soon). If someone has more knowledge on this issue then please open a new issue on it.

This spec segfaults on latest JRuby 9.2.19.0 in JVM. https://app.circleci.com/pipelines/github/airbrake/airbrake/227/workflows/3db05cae-87d8-4599-9da2-a37c4c76bf4f/jobs/9670 During an extensive debugging session I haven't been able to identify the root cause. I found an identical report from 2012 but it had no resolution: typhoeus/typhoeus#202 I also found related links: * jruby/jruby#231 * jruby/jruby#752

ghost self-assigned this Jun 17, 2013

headius mentioned this issue Apr 9, 2014

Random sidekiq crashes sidekiq/sidekiq#1400

Closed

enebo added the JRuby 1.7.x label Feb 17, 2017

enebo added this to the Invalid or Duplicate milestone Feb 17, 2017

enebo closed this as completed Feb 17, 2017

kyrylo mentioned this issue Sep 21, 2021

spec/integration/rails: skip Typhoeus test on JRuby airbrake/airbrake#1183

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Sponsors

FFI/LibCurl buffer overflow under heavy load. #752

FFI/LibCurl buffer overflow under heavy load. #752

jgwmaxwell commented May 21, 2013

ghost commented Jun 14, 2013

jgwmaxwell commented Jun 14, 2013

ghost commented Jun 14, 2013

leifg commented Apr 9, 2014

headius commented Apr 9, 2014

headius commented Apr 9, 2014

headius commented Apr 9, 2014

hanshasselberg commented Apr 9, 2014

jgwmaxwell commented Apr 10, 2014

leifg commented Apr 15, 2014

enebo commented Feb 17, 2017

FFI/LibCurl buffer overflow under heavy load. #752

FFI/LibCurl buffer overflow under heavy load. #752

Comments

jgwmaxwell commented May 21, 2013

Situation:

Stack Trace:

Outline:

ghost commented Jun 14, 2013

jgwmaxwell commented Jun 14, 2013

ghost commented Jun 14, 2013

leifg commented Apr 9, 2014

headius commented Apr 9, 2014

headius commented Apr 9, 2014

headius commented Apr 9, 2014

hanshasselberg commented Apr 9, 2014

jgwmaxwell commented Apr 10, 2014

leifg commented Apr 15, 2014

enebo commented Feb 17, 2017