Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor performance compared to MRI on AWS + PG #3153

Open
headius opened this issue Jul 19, 2015 · 9 comments
Open

Poor performance compared to MRI on AWS + PG #3153

headius opened this issue Jul 19, 2015 · 9 comments

Comments

@headius
Copy link
Member

headius commented Jul 19, 2015

The following script (or a similar version using live data) has been reported by @yorickpeterse as performing significantly worse than MRI, to the tune of many JRuby threads only doing 500 things/sec versus a single MRI thread doing 2500 things/sec.

https://gist.github.com/brixen/bc9e2a88338439bee855

There's a number of possible reasons for this. The most likely in my mind is that one of the libraries specific to JRuby has a bottleneck. It's possible it is in the JRuby runtime itself, but this poor performance has been reported against both 1.7 and 9k.

@headius
Copy link
Member Author

headius commented Jul 19, 2015

My first run of this, with defaults everywhere (e.g. in SQS), finished 150k records in 2:08 at a rate of about 1160 items/s. This is running with 10 threads on my MBP, so I'm confused where the 500 items/s number comes from. Will try to get an MRI comparison to ensure I'm actually seeing the slow perf.

@headius
Copy link
Member Author

headius commented Jul 19, 2015

It appears that running with -Xjit.threshold=0, as @yorickpeterse originally did, actually slows this benchmark down a fair bit. The same run only achieved about 900 items/s on my system. This could indicate that there's a performance bottleneck in the JIT that this code is hitting.

@headius
Copy link
Member Author

headius commented Jul 19, 2015

First discovery...but it did not help performance as far as I could see: jruby/jruby-openssl#55

@headius
Copy link
Member Author

headius commented Jul 19, 2015

Another discovery...with invokedynamic disabled, the benchmark accelerates much faster but appears to have a lower peak performance. It does, however, show no degradation from forcing JIT. With indy off and jit.threshold = 0, the job completes in 2:13 with an overall rate of 1120/s. This is roughly the same as indy on and normal jit.threshold.

With indy off and normal jit.threshold (basically no command line args at all) the numbers are largely unchanged. It may be that there's an allocation bottleneck causing these numbers to all end up roughly the same.

@headius
Copy link
Member Author

headius commented Jul 19, 2015

D'oh. I realized that the consistent top rate may be due to my upstream pipe being rather slow. I saw my network max out around 400KB/s, which is around 3.2Mb/s of my 5Mb/s upstream. I'm going to run some numbers from a big fast EC2 instance.

@yorickpeterse
Copy link

It's worth mentioning I ran the script with Java 8 (OpenJDK) with invoke dynamic and the JIT threshold set to 0. I ran this on a c3.8xlarge EC2 instance. Using this setup I never managed to get it above 500-or-so jobs per second. This was using JRuby 9k RC2.

@headius
Copy link
Member Author

headius commented Jul 19, 2015

Ok, so on an EC2 m1.xlarge, the first 150k run ended up almost 2000 items/s and was still climbing when it completed. I increased the number of dummy records by a factor of ten, and the ultimate result (1.5M records) was significantly better than 500 items/s. Here's the various flags and their scores:

no flags: 8:43, 2867.09/s
indy: 9:00, 2776.71/s
indy, jit=0: 8:54, 2806.25/s

I'm not sure about the relative performance of an m1.xlarge versus a c3.8xlarge, but this already seems like it's well over the 500/s you say you couldn't exceed. Something's not adding up.

Measurements were with JRuby master on Java 1.7.0u55.

@headius
Copy link
Member Author

headius commented Jul 19, 2015

Same run, same everything except on a c3.8xlarge with indy on, jit normal: 1:39, 15000.75/s.

@yorickpeterse Are you sure this script is representative of the 500/s you were getting? Did you maybe have --dev in your JRUBY_OPTS or something?

@headius
Copy link
Member Author

headius commented Jul 19, 2015

Sorry, not quite the same run. I bumped up the number of threads to 64 (c3.xlarge's 32 threads * 2).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants