-
-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor performance compared to MRI on AWS + PG #3153
Comments
My first run of this, with defaults everywhere (e.g. in SQS), finished 150k records in 2:08 at a rate of about 1160 items/s. This is running with 10 threads on my MBP, so I'm confused where the 500 items/s number comes from. Will try to get an MRI comparison to ensure I'm actually seeing the slow perf. |
It appears that running with -Xjit.threshold=0, as @yorickpeterse originally did, actually slows this benchmark down a fair bit. The same run only achieved about 900 items/s on my system. This could indicate that there's a performance bottleneck in the JIT that this code is hitting. |
First discovery...but it did not help performance as far as I could see: jruby/jruby-openssl#55 |
Another discovery...with invokedynamic disabled, the benchmark accelerates much faster but appears to have a lower peak performance. It does, however, show no degradation from forcing JIT. With indy off and jit.threshold = 0, the job completes in 2:13 with an overall rate of 1120/s. This is roughly the same as indy on and normal jit.threshold. With indy off and normal jit.threshold (basically no command line args at all) the numbers are largely unchanged. It may be that there's an allocation bottleneck causing these numbers to all end up roughly the same. |
D'oh. I realized that the consistent top rate may be due to my upstream pipe being rather slow. I saw my network max out around 400KB/s, which is around 3.2Mb/s of my 5Mb/s upstream. I'm going to run some numbers from a big fast EC2 instance. |
It's worth mentioning I ran the script with Java 8 (OpenJDK) with invoke dynamic and the JIT threshold set to 0. I ran this on a c3.8xlarge EC2 instance. Using this setup I never managed to get it above 500-or-so jobs per second. This was using JRuby 9k RC2. |
Ok, so on an EC2 m1.xlarge, the first 150k run ended up almost 2000 items/s and was still climbing when it completed. I increased the number of dummy records by a factor of ten, and the ultimate result (1.5M records) was significantly better than 500 items/s. Here's the various flags and their scores: no flags: 8:43, 2867.09/s I'm not sure about the relative performance of an m1.xlarge versus a c3.8xlarge, but this already seems like it's well over the 500/s you say you couldn't exceed. Something's not adding up. Measurements were with JRuby master on Java 1.7.0u55. |
Same run, same everything except on a c3.8xlarge with indy on, jit normal: 1:39, 15000.75/s. @yorickpeterse Are you sure this script is representative of the 500/s you were getting? Did you maybe have --dev in your JRUBY_OPTS or something? |
Sorry, not quite the same run. I bumped up the number of threads to 64 (c3.xlarge's 32 threads * 2). |
The following script (or a similar version using live data) has been reported by @yorickpeterse as performing significantly worse than MRI, to the tune of many JRuby threads only doing 500 things/sec versus a single MRI thread doing 2500 things/sec.
https://gist.github.com/brixen/bc9e2a88338439bee855
There's a number of possible reasons for this. The most likely in my mind is that one of the libraries specific to JRuby has a bottleneck. It's possible it is in the JRuby runtime itself, but this poor performance has been reported against both 1.7 and 9k.
The text was updated successfully, but these errors were encountered: