-
-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add salt to Array#hash #2452
Comments
Copying @Who828. |
Doesn't need to be cryptographically secure though it's not terrible. It just needs to be on a per ruby env basis and decently random. I think the idea is just to avoid collisions with objects from other runtimes. Using a CSPRNG would have a higher runtime cost than something closer to MRI's pointer, the runtime's java hash perhaps? |
It does need to be cryptographically secure to avoid hashDoS. That's why SipHash (which effectively computes a cryptographic MAC) is used instead of e.g. universal hashing. That's perhaps a tad conservative outlook, but it's one coming from a guy whose Twitter handle is "hashbreaker" (djb). All that said, I'm not sure any of what I just said applies to |
I did some reading on the switch to SipHash right before i saw this comment. You're right it should be from a CSPRNG where possible. This salt seems to be for preventing pre computing hash values in search of collisions so a hashDoS can be performed. Also, it does apply to all implementations/overrides of #hash since that's what the attack leveraged in the first place. That said, the hit to startup for a CSPRNG source on some Java implementations may be too long. I think, on those systems only, a "decent" random will be good enough in this case. To me, this suggests an unseeded SecureRandom call to let the jvm determine the best random source and give us a few bytes. |
For performance reasons (and because we don't really believe in hashDoS) JRuby uses perlhash by default, because as far as we've heard there's not a reliable exploit for it. MRI is still using murmurhash for Array#hash calculation, so they're not getting full siphash security either. I think we all agree we need some salt in there, but I'm not sure we need to go beyond what MRI does with some weakly random salt like a function pointer. @Who828 is working on fixing our Array#hash impl and for the moment I have suggested System.identityHashCode(runtime), which should be different per JVM process. |
Ok, so I'm wrong. We'll need something else that's lightweight and as pseudo-random as a function pointer is for MRI.
|
Pull the hashCode of the current Runtime. |
@headius your best bet is to grab some cryptographically random value at VM start. I know the JVM likes to read from I think you can just grab it at boot so it's unique per VM instance, like @tduehr was saying |
IMO a cryptographically secure seed would be the safe choice, as @tduehr and @tarcieri proposed. For what it's worth, this is how Java 7 used to seed String hashing, and here's yet another way. Both are not cryptographically secure, though. The hashDoS attack on Murmur 2 and 3 in 2012 didn't even rely on the seed, it would have worked even when seeding with perfectly cryptographically secure random data. According to this article, there are three requirements to counter hashDoS-style collision attacks:
That's what a cryptographically secure hash like SipHash or the slower SHA-2/3 etc. functions provide. MurmurHash 2/3 failed in this regard, it was possible to create collisions rergardless of the seed. To be fair, it was never designed for this purpose.
Quoting the SipHash paper:
Like @tduehr already mentioned, I too believe that the shorter outputs of SipHash, Murmur or any other general-purpose hash function used in this context could potentially be vulnerable to offline brute force attacks if the seed was fixed or could be predicted easily.
This would obviously be the easiest way to prevent attacks, but it is typically an application-level concern and not applicable universally. So it seems like a cryptographically secure value is the safest bet, on the other hand I am not aware if any of the schemes using a less secure random value have been attacked successfully. Either way, the implementation can be problematic. |
We have patched in other commits the logic to prepare a random number generator, a la @nirvdrum findings and so on. I think we can just pull from the the same store to get a good salt and cache that somewhere statically or on a per-instance basis. @Who828 Can you make that small change to your PR? I discovered that identityHashCode of the current Ruby instance does not change much across runs. |
For #2437, we partially aligned our Array#hash impl with MRI by making it use our default hashing algorithm (either perlhash or siphash) to calculate its hash value based on contents. However, we missed one piece of the MRI logic that needed a bit more discussion: pseudo-random salt.
MRI uses the address of the rb_ary_hash function as salt for calculating hash. This may or may not be different between runs, so it's not cryptographically sound salt, but it does make these hash values less predictable on some systems.
So...go as far as truly cryptographically-sound salt? Don't bother with salt at all?
Copying @tarcieri, @tduehr, @enebo.
The text was updated successfully, but these errors were encountered: