Add salt to Array#hash #2452

headius · 2015-01-12T17:14:59Z

For #2437, we partially aligned our Array#hash impl with MRI by making it use our default hashing algorithm (either perlhash or siphash) to calculate its hash value based on contents. However, we missed one piece of the MRI logic that needed a bit more discussion: pseudo-random salt.

MRI uses the address of the rb_ary_hash function as salt for calculating hash. This may or may not be different between runs, so it's not cryptographically sound salt, but it does make these hash values less predictable on some systems.

So...go as far as truly cryptographically-sound salt? Don't bother with salt at all?

Copying @tarcieri, @tduehr, @enebo.

headius · 2015-01-12T17:15:17Z

Copying @Who828.

tduehr · 2015-01-12T19:01:59Z

Doesn't need to be cryptographically secure though it's not terrible. It just needs to be on a per ruby env basis and decently random.

I think the idea is just to avoid collisions with objects from other runtimes. Using a CSPRNG would have a higher runtime cost than something closer to MRI's pointer, the runtime's java hash perhaps?

tarcieri · 2015-01-12T19:05:42Z

It does need to be cryptographically secure to avoid hashDoS. That's why SipHash (which effectively computes a cryptographic MAC) is used instead of e.g. universal hashing. That's perhaps a tad conservative outlook, but it's one coming from a guy whose Twitter handle is "hashbreaker" (djb).

All that said, I'm not sure any of what I just said applies to Array#hash. I'm not sure what the attack would be there...

tduehr · 2015-01-12T19:57:33Z

I did some reading on the switch to SipHash right before i saw this comment. You're right it should be from a CSPRNG where possible. This salt seems to be for preventing pre computing hash values in search of collisions so a hashDoS can be performed.

Also, it does apply to all implementations/overrides of #hash since that's what the attack leveraged in the first place.

That said, the hit to startup for a CSPRNG source on some Java implementations may be too long. I think, on those systems only, a "decent" random will be good enough in this case. To me, this suggests an unseeded SecureRandom call to let the jvm determine the best random source and give us a few bytes.

headius · 2015-01-12T20:06:43Z

For performance reasons (and because we don't really believe in hashDoS) JRuby uses perlhash by default, because as far as we've heard there's not a reliable exploit for it. MRI is still using murmurhash for Array#hash calculation, so they're not getting full siphash security either.

I think we all agree we need some salt in there, but I'm not sure we need to go beyond what MRI does with some weakly random salt like a function pointer. @Who828 is working on fixing our Array#hash impl and for the moment I have suggested System.identityHashCode(runtime), which should be different per JVM process.

headius · 2015-01-12T20:07:57Z

Ok, so I'm wrong. We'll need something else that's lightweight and as pseudo-random as a function pointer is for MRI.

~/projects/jruby $ jruby -e "p java.lang.System.identity_hash_code(JRuby.runtime)"
1252169911

~/projects/jruby $ jruby -e "p java.lang.System.identity_hash_code(JRuby.runtime)"
1252169911

~/projects/jruby $ jruby -e "p java.lang.System.identity_hash_code(JRuby.runtime)"
1252169911

tduehr · 2015-01-12T20:22:41Z

Pull the hashCode of the current Runtime.

tarcieri · 2015-01-12T23:30:34Z

@headius your best bet is to grab some cryptographically random value at VM start. I know the JVM likes to read from /dev/random instead of /dev/urandom, so seems hard. If there's an internal CSPRNG you can grab it from, that'd be fine too.

I think you can just grab it at boot so it's unique per VM instance, like @tduehr was saying

emboss · 2015-01-19T03:44:05Z

IMO a cryptographically secure seed would be the safe choice, as @tduehr and @tarcieri proposed.

For what it's worth, this is how Java 7 used to seed String hashing, and here's yet another way. Both are not cryptographically secure, though.

The hashDoS attack on Murmur 2 and 3 in 2012 didn't even rely on the seed, it would have worked even when seeding with perfectly cryptographically secure random data. According to this article, there are three requirements to counter hashDoS-style collision attacks:

The hash function does not allow multi-collision attacks

That's what a cryptographically secure hash like SipHash or the slower SHA-2/3 etc. functions provide. MurmurHash 2/3 failed in this regard, it was possible to create collisions rergardless of the seed. To be fair, it was never designed for this purpose.

The hash function uses at least a per-process hash seed randomization

Quoting the SipHash paper:

On startup a program reads a secret SipHash key from the operating system's cryptographic
random-number generator; the program then uses SipHash for all of its hash tables.

Like @tduehr already mentioned, I too believe that the shorter outputs of SipHash, Murmur or any other general-purpose hash function used in this context could potentially be vulnerable to offline brute force attacks if the seed was fixed or could be predicted easily.

The interface to untrusted potential attackers uses simple, hard limits on the number of keys it will accept

This would obviously be the easiest way to prevent attacks, but it is typically an application-level concern and not applicable universally.

So it seems like a cryptographically secure value is the safest bet, on the other hand I am not aware if any of the schemes using a less secure random value have been attacked successfully. Either way, the implementation can be problematic.

headius · 2015-01-26T19:53:12Z

We have patched in other commits the logic to prepare a random number generator, a la @nirvdrum findings and so on. I think we can just pull from the the same store to get a good salt and cache that somewhere statically or on a per-instance basis.

@Who828 Can you make that small change to your PR? I discovered that identityHashCode of the current Ruby instance does not change much across runs.

kares · 2017-06-22T18:08:14Z

#2437 has been merged and #2453 has been closed.
by default JRuby uses a secure-random generated (long) value as a hashing starter.

headius added core ruby 1.9 JRuby 9000 labels Jan 12, 2015

donv mentioned this issue Oct 9, 2015

Bad value for $$ #3380

Closed

kares closed this as completed Jun 22, 2017

kares added this to the Non-Release milestone Jun 22, 2017

eregon mentioned this issue Jan 1, 2018

Remove SipHash and fix random seed per process with SVM oracle/truffleruby#912

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add salt to Array#hash #2452

Add salt to Array#hash #2452

headius commented Jan 12, 2015

headius commented Jan 12, 2015

tduehr commented Jan 12, 2015

tarcieri commented Jan 12, 2015

tduehr commented Jan 12, 2015

headius commented Jan 12, 2015

headius commented Jan 12, 2015

tduehr commented Jan 12, 2015

tarcieri commented Jan 12, 2015

emboss commented Jan 19, 2015

headius commented Jan 26, 2015

kares commented Jun 22, 2017

Add salt to Array#hash #2452

Add salt to Array#hash #2452

Comments

headius commented Jan 12, 2015

headius commented Jan 12, 2015

tduehr commented Jan 12, 2015

tarcieri commented Jan 12, 2015

tduehr commented Jan 12, 2015

headius commented Jan 12, 2015

headius commented Jan 12, 2015

tduehr commented Jan 12, 2015

tarcieri commented Jan 12, 2015

emboss commented Jan 19, 2015

headius commented Jan 26, 2015

kares commented Jun 22, 2017