Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add salt to Array#hash #2452

Closed
headius opened this issue Jan 12, 2015 · 11 comments
Closed

Add salt to Array#hash #2452

headius opened this issue Jan 12, 2015 · 11 comments

Comments

@headius
Copy link
Member

headius commented Jan 12, 2015

For #2437, we partially aligned our Array#hash impl with MRI by making it use our default hashing algorithm (either perlhash or siphash) to calculate its hash value based on contents. However, we missed one piece of the MRI logic that needed a bit more discussion: pseudo-random salt.

MRI uses the address of the rb_ary_hash function as salt for calculating hash. This may or may not be different between runs, so it's not cryptographically sound salt, but it does make these hash values less predictable on some systems.

So...go as far as truly cryptographically-sound salt? Don't bother with salt at all?

Copying @tarcieri, @tduehr, @enebo.

@headius
Copy link
Member Author

headius commented Jan 12, 2015

Copying @Who828.

@tduehr
Copy link
Contributor

tduehr commented Jan 12, 2015

Doesn't need to be cryptographically secure though it's not terrible. It just needs to be on a per ruby env basis and decently random.

I think the idea is just to avoid collisions with objects from other runtimes. Using a CSPRNG would have a higher runtime cost than something closer to MRI's pointer, the runtime's java hash perhaps?

@tarcieri
Copy link

It does need to be cryptographically secure to avoid hashDoS. That's why SipHash (which effectively computes a cryptographic MAC) is used instead of e.g. universal hashing. That's perhaps a tad conservative outlook, but it's one coming from a guy whose Twitter handle is "hashbreaker" (djb).

All that said, I'm not sure any of what I just said applies to Array#hash. I'm not sure what the attack would be there...

@tduehr
Copy link
Contributor

tduehr commented Jan 12, 2015

I did some reading on the switch to SipHash right before i saw this comment. You're right it should be from a CSPRNG where possible. This salt seems to be for preventing pre computing hash values in search of collisions so a hashDoS can be performed.

Also, it does apply to all implementations/overrides of #hash since that's what the attack leveraged in the first place.

That said, the hit to startup for a CSPRNG source on some Java implementations may be too long. I think, on those systems only, a "decent" random will be good enough in this case. To me, this suggests an unseeded SecureRandom call to let the jvm determine the best random source and give us a few bytes.

@headius
Copy link
Member Author

headius commented Jan 12, 2015

For performance reasons (and because we don't really believe in hashDoS) JRuby uses perlhash by default, because as far as we've heard there's not a reliable exploit for it. MRI is still using murmurhash for Array#hash calculation, so they're not getting full siphash security either.

I think we all agree we need some salt in there, but I'm not sure we need to go beyond what MRI does with some weakly random salt like a function pointer. @Who828 is working on fixing our Array#hash impl and for the moment I have suggested System.identityHashCode(runtime), which should be different per JVM process.

@headius
Copy link
Member Author

headius commented Jan 12, 2015

Ok, so I'm wrong. We'll need something else that's lightweight and as pseudo-random as a function pointer is for MRI.

~/projects/jruby $ jruby -e "p java.lang.System.identity_hash_code(JRuby.runtime)"
1252169911

~/projects/jruby $ jruby -e "p java.lang.System.identity_hash_code(JRuby.runtime)"
1252169911

~/projects/jruby $ jruby -e "p java.lang.System.identity_hash_code(JRuby.runtime)"
1252169911

@tduehr
Copy link
Contributor

tduehr commented Jan 12, 2015

Pull the hashCode of the current Runtime.

@tarcieri
Copy link

@headius your best bet is to grab some cryptographically random value at VM start. I know the JVM likes to read from /dev/random instead of /dev/urandom, so seems hard. If there's an internal CSPRNG you can grab it from, that'd be fine too.

I think you can just grab it at boot so it's unique per VM instance, like @tduehr was saying

@emboss
Copy link

emboss commented Jan 19, 2015

IMO a cryptographically secure seed would be the safe choice, as @tduehr and @tarcieri proposed.

For what it's worth, this is how Java 7 used to seed String hashing, and here's yet another way. Both are not cryptographically secure, though.

The hashDoS attack on Murmur 2 and 3 in 2012 didn't even rely on the seed, it would have worked even when seeding with perfectly cryptographically secure random data. According to this article, there are three requirements to counter hashDoS-style collision attacks:

  • The hash function does not allow multi-collision attacks

That's what a cryptographically secure hash like SipHash or the slower SHA-2/3 etc. functions provide. MurmurHash 2/3 failed in this regard, it was possible to create collisions rergardless of the seed. To be fair, it was never designed for this purpose.

  • The hash function uses at least a per-process hash seed randomization

Quoting the SipHash paper:

On startup a program reads a secret SipHash key from the operating system's cryptographic
random-number generator; the program then uses SipHash for all of its hash tables.

Like @tduehr already mentioned, I too believe that the shorter outputs of SipHash, Murmur or any other general-purpose hash function used in this context could potentially be vulnerable to offline brute force attacks if the seed was fixed or could be predicted easily.

  • The interface to untrusted potential attackers uses simple, hard limits on the number of keys it will accept

This would obviously be the easiest way to prevent attacks, but it is typically an application-level concern and not applicable universally.

So it seems like a cryptographically secure value is the safest bet, on the other hand I am not aware if any of the schemes using a less secure random value have been attacked successfully. Either way, the implementation can be problematic.

@headius
Copy link
Member Author

headius commented Jan 26, 2015

We have patched in other commits the logic to prepare a random number generator, a la @nirvdrum findings and so on. I think we can just pull from the the same store to get a good salt and cache that somewhere statically or on a per-instance basis.

@Who828 Can you make that small change to your PR? I discovered that identityHashCode of the current Ruby instance does not change much across runs.

@donv donv mentioned this issue Oct 9, 2015
@kares
Copy link
Member

kares commented Jun 22, 2017

#2437 has been merged and #2453 has been closed.
by default JRuby uses a secure-random generated (long) value as a hashing starter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants