-
-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SecureRandom performance could be better #1405
Comments
FWIW, the final results in that gist are (I believe) as much as 10x faster than they used to be. |
An additional discovery: It appears that the JDK spins up a thread to seed the random number generator (once) for each new thread that requests random data. This appears to happen with a new SecureRandom each time, a single shared SecureRandom, or a SecureRandom created once per thread. It may be unavoidable, but it should also be hidden from us and not matter a great deal unless people are spinning up a lot of new short-lived threads and requesting random data in each. |
@headius Another option is to specify the PRNG engine when creating the SecureRandom instance. @bbrowning reportedly saw the same performance increase using |
Including @emboss. |
I can circulate this PR around for comments |
@tarcieri That would be great. We're having trouble getting any consensus, even from folks we might consider experts in this area. |
If you're using Java's built-in SecureRandom once per thread, and they're all just backed by /dev/urandom, I think you should be totally fine. |
@tarcieri They're all backed by /dev/random at the beginning, but some continue to reseed from /dev/random and others are just algorithmic after that point. They can be configured to use urandom, which definitely speeds up throughput at the cost of good entropy. MRI appears to try to use OpenSSL's RAND functions, and if that's not available it uses urandom or the equivalent calls on Windows. I do not know how RAND compares to random/urandom. I'm going to go with a commit that uses per-thread SecureRandom for now and keep researching. |
I wouldn't place too much qualitative value on numbers coming from Most security-conscious software I know of uses |
It would be nice if the JVM defaulted to using /dev/urandom instead of /dev/random. As it stands, the only way to use /dev/urandom is to set a JVM system property, which makes for an inconvenient out-of-the-box experience. My vote is to continue using the JVM default, but switching to SHA1PRNG instead of NativePRNG so that /dev/random is only accessed to seed each SecureRandom instance. Then we have to decide how often to create SecureRandom instances? Is one per thread enough? Or should we create one that's valid for N uses, then throw that one away and create a new one for the next N uses? Or some combination of the above? With a default Rails app this gets hit on every request, so we want to find the right balance between security and performance. |
My concern about hardcoding SHA1PRNG is that it never reseeds unless we reseed it. At least /dev/urandom will try to maintain a pool of entropy as it runs. I worry that hardcoding SHA1PRNG could lead to an attacker being able to guess random numbers over time. I've tried to figure out how to get the JDK to pick the faster urandom source without specifying the property, but I have been unsuccessful. I will take another look. |
You will probably get the best security properties by reading from Trusting the OS's PRNG is probably your best bet. |
Ok, I think I have finally figured out how OpenJDK sets up these PRNGs. By default, on all systems I tested, it's using a native PRNG that always reads from /dev/random. This is not the fastest one, since it has extra levels of synchronization and buffering overhead to read from a globally-shared FileInputStream. If you specify the /dev/random device in the egd property, or request the SHA1PRNG algorithm, a different algorithm is used that initially seeds from /dev/random and then uses an algorithmic RNG based on repeated SHA1 hashing of internal state. This appears to be the fastest option, but it only seeds once. And If I'm reading the code right, it seeds a global SHA1PRNG once, and then uses bytes from that SHA1PRNG as the seed for all future instances. HOWEVER, it does appear that if we request seed bytes from SecureRandom, it always goes back to the engine, which in this case will go to the native /dev/random device. See sun.security.provider.SecureRandom, sun.security.provider.SeedGenerator. If you specify the /dev/urandom devices in the egd property, most systems appear to use a NativePRNG that uses /dev/urandom. This appears to be a bit faster than the /dev/random NativePRNG, but not as fast as the SHA1PRNG. I have not been able to find a way to select this PRNG from within an already-running JVM. For the NativePRNG logic, see sun.security.provider.NativePRNG under the solaris src tree in OpenJDK. If you specify a device that does not exist, OpenJDK uses a SHA1PRNG that uses a background thread to generate entropy. You can see these threads running by passing --sample to JRuby or -Xprof to the JVM. Interestingly, the Android fix appears to go all the way to forcing /dev/urandom for all random bytes. I'm not sure if they did that because it didn't seed at all, or because it was similar to the OpenJDK impl in that it only seeded once. Given this analysis, I think this is where we stand:
If it's acceptable for MRI to use an algorithmic RNG by default, it may be acceptable for us to do the same. But if they are at risk in that case, we would not do well to follow them. |
If you'd like to test your random number generation empirically, you could use a tool like Ent: http://www.fourmilab.ch/random/ I'd suggest generating a LOT (tens of gigabytes+) of random data on a variety of platforms though. Ent will, in theory, tell you if you're getting good random numbers. |
Update on performance changes to date. I have made the following changes, which are all now active on jruby-1_7:
I have not done any change to the default PRNG provider, so it should still be using /dev/random for all data on most platforms. Given these changes, our jruby-1_7 numbers are at the top.
|
@tarcieri I gave ent a try with the sha1prng, and the numbers are somewhat surprising. It appears to be nearly as good as the random devices. My /dev/random largely matched the Chi Squared on the ent site, so I won't post them. But here's the numbers for SHA1PRNG explicitly requested at SecureRandom construction time.
This is just one metric, but it's starting to look like OpenJDK's SHA1PRNG isn't too bad. |
Ok, waking this one up again... There are still outstanding reports of blocking or slow behavior using SecureRandom, like nats-io/nats.rb#123 and this comment by @cheald. It seems we're still not using whatever is equivalent to MRI, since as far as I know MRI does not get reports of entropy-starved blocking behaviors. So...anyone still seeing this problem that can help us explore solutions? @cheald @trinode @digitalextremist @tarcieri @wallyqs |
I haven't seen it lately since I've added entropy daemons to my standard saltstack on new machines, but it might be possible to replicate it on a fresh VM. I'll run some experiments on AWS VMs. |
I'm unable to replicate the issue in a fresh VM here. Amazon Linux, no entropy daemon. I'm keeping the entropy pool drained:
Then installed JRuby and benchmarked SecureRandom:
Despite the system being entropy-starved, it had no problem creating UUIDs. My previous issue may have been something other than SecureRandom. I don't recall the circumstances other than trying to re-establish my dev environment after a fresh OS install. |
@cheald Ok, thanks for following up. Hopefully if there's still an issue, someone else can reproduce it for us. Of course, I hope there's no longer still an issue :-) |
For what it's worth, here's what I netted out on for JRuby support in the https://github.com/cryptosphere/sysrandom/blob/master/lib/sysrandom.rb#L17 |
Running an entropy daemon ( |
We had another report of problems today. I have pushed to master a config property jruby.preferred.prng that can be used to select from the available PRNG in the JDK. On my system (OS X, OracleJDK 8u92) and on the reporter's Linux, the available PRNG are NativePRNG, SHA1PRNG, NativePRNGBlocking, NativePRNGNonBlocking. By default, we try to use SHA1PRNG since we believed it consumes the least entropy. To get that list on your system, run this: Java::sun.security.jca.Providers.getProviderList.providers.each do |p|
p.services.each do |s|
puts s.algorithm if s.type == "SecureRandom"
end
end |
I didn't really show how to use the property... it's I've kicked off a build for 9.1.3.0-SNAPSHOT on http://ci.jruby.org. Should be ready soon. |
Haven't had complaints yet, but I'm curious how it would work in some of the scenarios where people are having problems. |
Given the reports we're getting about SecureRandom blocking for sometimes many seconds, I've done another commit to try the following providers in turn:
I've kicked off another build for http://ci.jruby.org. |
Another discussion of SecureRandom: https://tersesystems.com/2015/12/17/the-right-way-to-use-securerandom/ |
@tarcieri we are running absolutely everything in Docker. Few months ago we've spotted that our CI builds are starting to increase on busy hours. Initially we thought that it's caused by memory constraints on build agents, but couple days ago we suddenly hit an issue when our Rails apps are failed to boot in production in 5 minutes initialization window defined by healthcheck. We were unable to reproduce this problem on developer machines, so it was clear that something wrong with a deployment environment. After investigation I found out that boot process stuck on this call: def generate_manifest_path
".sprockets-manifest-#{SecureRandom.hex(16)}.json"
end We checked metrics from Prometheus node_exporter and found out that our entropy pool is completely dry ;) We've deployed I have no idea who is responsible for the pool exhaustion, I suspect that might have something to do with Ranchers' IPSec networking, because it's the only common thing on all hosts, and you can see that we have pretty much the same exhaustion pattern on all hosts, even they don't run anything jruby-ish. So no wonder that @headius heard stories about 30 minutes rails boot times :) |
@y8 If you are able to test my change in production, hopefully the nonblocking PRNG will be the solution to all our problems :-) We are starting to stabilize for 9.1.3.0, so let us know when you're able to confirm. |
Well I'm going to optimistically call this good. On Java 8, we will attempt to use the NativePRNGNonBlocking. If that's not available, or on other platforms, we will try to use the SHA1PRNG. After that we just let the JDK decide. If someone's still able to see entropy starvation with JRuby 9.1.3.0 (master right now), please let us know ASAP. |
I botched the previous patch a bit by unconditionally re-assigning the secureRandom local to a default JDK new SecureRandom. This could cause systems without the default preferred PRNG (NativePRNGNonBlocking, Java 8+) to have slower thread startup and/or random number generation.
Our SecureRandom in 1.7.10 (from securerandom.rb) was implemented using Java integration calling out to java.security.SecureRandom. It constructed a new instance each time.
In 09b169a, I moved the meat of the logic (#random_bytes) into Java code, which roughly doubled performance on my machine, bringing it close to MRI.
@bbrowning and @nirvdrum discovered a JVM property that, when passed, sets SecureRandom to use /dev/urandom or /dev/random, which improves performance further: java.security.egd=/dev/random. On my system, this again doubles performance, but the format appears to vary across JVMs.
Finally, I made a modification that constructs a SecureRandom per thread, avoiding synchronization contention, and started to move other methods into Java starting with #hex. This brought about the best performance so far, but I have not committed this change because I'm unsure about saving the SecureRandom per thread and whether that does something "bad" as far as the security of its results.
We need to evaluate all these options and come up with a good balance of security and performance, since it seems that SecureRandom is hit very heavily by frameworks like Rails.
Performance with per-thread SecureRandom and the Java property across JVMs on OS X: https://gist.github.com/headius/8428930
Patch to use a SecureRandom per thread and make #hex native: https://gist.github.com/headius/8429206
The text was updated successfully, but these errors were encountered: