Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major performance problem with 0.9.20 and OpenJDK 1.8.0_72 #127

Closed
dgolombek opened this issue Mar 22, 2017 · 7 comments
Closed

Major performance problem with 0.9.20 and OpenJDK 1.8.0_72 #127

dgolombek opened this issue Mar 22, 2017 · 7 comments

Comments

@dgolombek
Copy link
Contributor

dgolombek commented Mar 22, 2017

We have a service with ~2500 examples (with RSpec 3.5) that takes ~5 minutes to run normally. After upgrading from jruby-openssl 0.9.19 to 0.9.20 (and NO other changes), this same test suite times out after 75 minutes on our Jenkins builders. Everything runs same speed with both JRuby-openssl versions on my local machine, a MacOS 10.11.6 box running

java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)

But the Jenkins server is a Ubuntu 12.04 large AWS instance, running

openjdk version "1.8.0_72-internal"
OpenJDK Runtime Environment (build 1.8.0_72-internal-b15)
OpenJDK 64-Bit Server VM (build 25.72-b15, mixed mode)

I've confirmed that all specs run happily w/out a network connection, so I don't think this is tied to any actual network traffic. The slowdown happens pretty rapidly -- after about 50 specs, specs that normally take < 1 second are taking two minutes.

I haven't spun up my own Ubuntu 12+OpenJDK box yet to start diving into what might be going on yet. What data can I provide to help debug this?

@kares
Copy link
Member

kares commented Mar 22, 2017

The only change that could be affecting this is the BC upgrade to 1.56. So I am not really sure what data to recommend - in general best approach is to debug this at the JVM level (JMX connect remotely if possible).

@dgolombek
Copy link
Contributor Author

dgolombek commented Mar 22, 2017

One interesting wrinkle that I didn't note before -- I'd previously updated our service to jruby-jars 9.1.8.0, which includes jruby-openssl 0.9.20. However, we ALSO use lookout/fast-rsa-engine for performance improvements, which currently has an explicit requirement on both jruby-openssl and bcprov-jdk15on 1.50 -- thus overriding (I think) the jruby-openssl from jruby-jars.

So that raises a couple possibilities --

  1. Having multiple bcprov versions simultaneously is causing problems. We've seen this previously, but with different symptoms
  2. Having duplicate copies of jruby-openssl jar is suddenly causing problems

@mkristian do you have any thoughts here? Would it make sense to have lookout/fast-rsa-engine to depend upon jruby-jars as a whole instead of jruby-openssl directly? And would it make sense to bump the default BCProv version in fast-rsa-engine?

@mkristian
Copy link
Member

@dgolombek jruby-jars is just a convenient thing for warbler (and maybe other tools) to pick up jruby and use it to build the war-file. no application really depends on this gem as such. so for gem to depend on it makes no sense since the ruby using it might be a different version altogether, etc

to bring the gem + jars version in line is a good idea. it could be very much the case that lookout/fast-rsa-engine does not work with newer BC versions and thus falling back to the slow RSA (just guessing) via some exception catching.

@kares
Copy link
Member

kares commented Mar 30, 2017

could we confirm that this is not a jossl regression? e.g. by removing the fast-rsa-engine and confirming times for 0.9.19 vs 0.9.20

@mkristian
Copy link
Member

having mutiple version of jruby-openssl and bcprov jars can produce all kind of class-loading problems. once the classloader is clean we see how things work. @kares I am sure that this is not a jssl problem.

@the-michael-toy
Copy link

the-michael-toy commented Apr 6, 2017

This Thread on the JRuby mailing list seems to summarize the problem.

One person had narrowed it down to this one call

    OpenSSL::Cipher.new('aes-256-gcm').random_iv

I did the following steps on a VM with a fresh install

% uname -a
Linux xenial 4.4.0-62-generic #83-Ubuntu SMP Wed Jan 18 14:10:15 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
% java -version
openjdk version "1.8.0_121"
OpenJDK Runtime Environment (build 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13)
OpenJDK 64-Bit Server VM (build 25.121-b13, mixed mode)

( anecdotally i don't think the JVM is the problem we also saw the slowdown on some CI machines which are running 1.8.0_77 )

  1. Install JRuby 9.1.7.0 and jruby-openssl-0.9.19
  2. Time OpenSSL::Cipher.new
    • 0.053098 seconds
  3. Upgrade jruby-openssl to 0.19.20 ( no change to JRuby version)
  4. Time OpenSSL::Cipher.new
    • 443.89 seconds, yes, 400+ seconds
  5. Edit $JAVA_HOME/jre/lib/security/java.security as suggested in the thread and change securerandom.source (still running jruby-openssl 0.9.20 and jruby 9.1.7.0)
  6. Time OpenSSL::Cipher.new
    • 0.046968 seconds

@dgolombek
Copy link
Contributor Author

@the-michael-toy that was it, thank you! I used -J-Djava.security.egd=file:/dev/./urandom instead of editing the java.security file, but it had the same effect.

I'd finally just gotten fast-rsa-engine updated as well as all our other BouncyCastle users, and was about to report that it had not solved the problem -- but it was definitely a good exercise to go through, and much better than our prior hack to avoid duplicate copies of the bcprov jar in our war...

I'll close this issue, since this is a Java/Linux problem, but this might be worth mentioning in the History file as a side-effect of the upgrade.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants