Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add (Half)SipHash PRF and integrate it into Crystal::Hasher #5104

Closed
wants to merge 4 commits into from

Conversation

ysbaddaden
Copy link
Contributor

@ysbaddaden ysbaddaden commented Oct 11, 2017

Implements streaming versions of siphash and halfsiphash pseudo random functions as Digest::SipHash and Digest::HalfSipHash with chosen rounds.

Refactors Crystal::Hasher to use SipHash(1, 3) on 64-bit systems and HalfSipHash(1, 3) on 32-bit systems, with tweaks to optimize some performance, e.g. String#hash —but probably did 💩

Update: choice of siphash-1-3 and hafsiphash-1-3 stems from linux kernel choices (see comments below).

Update: I notice I broke the contract of Object#hash which now returns either an UInt32 or an UInt64 depending on the platform. Maybe Digest::HalfSipHash should always compute an UInt64 to match Digest::SipHash (the algorithm allows it)?

end

def slice(value)
@siphash.update(value)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine for Slice(UInt8), maybe okay-ish with other primitives (e.g. booleans, integers, floats), but most probably 💩 for anything else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, fine for Int::Primitive and Float::Primitive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akzhan , it is debatable question.
Slice[1, 2, 3].hash.should eq(Slice[1_u64, 2_u64, 3_u64].hash) or should_not?

@RX14
Copy link
Contributor

RX14 commented Oct 11, 2017

We know siphash-1-3 is fine for a hash function, but do we know for sure that halfsiphash-1-3 is?

@ysbaddaden
Copy link
Contributor Author

Do not ever use HalfSipHash except for as a hashtable key function, and only
then when you can be absolutely certain that the outputs will never be
transmitted out of the kernel. This is only remotely useful over jhash as a
means of mitigating hashtable flooding denial of service attacks.

https://github.com/torvalds/linux/blob/master/Documentation/siphash.txt

I guess this fits our need... unless someone starts using #hash and publicly discloses values?

@RX14
Copy link
Contributor

RX14 commented Oct 11, 2017

Ah, so halfsiphash's typical variant is 1-3? That makes it a lot clearer, thanks.

@ysbaddaden
Copy link
Contributor Author

Also, this response by Jean-Philippe Aumasson on linux.kernel that pretty much details the above warning:
https://groups.google.com/d/msg/linux.kernel/9NxJW70-hEU/X7nWWkzDAAAJ

  • use siphash-2-4 for anything that requires cryptography quality —e.g. when disclosing hashes;
  • halsiphash is probably attackable;
  • halsiphash is probably stronger than other solutions, even with 1-3 rounds;
  • you may use halfsiphash-1-3 for mitigating hash flooding attacks —but never disclose hashes!

Implements streaming versions of the siphash family of pseudo random
functions, limited to a fixed result size. Digest::SipHash computes
an UInt64 hash, whereas Digest::HalfSipHash computes an UInt32 hash.
Improves Crystal::Hasher to use the SipHash family of pseudo random
functions for improved security of hashes; for example protecting
against hash flooding DoS attacks.

Since the hashes are never disclosed to a potential attacker, we
rely on the faster 1-3 versions instead of the cryptographically
verified 2-4 versions.

Last but not least, this uses siphash-1-3 on 64-bit systems and
halfsiphash-1-3 on 32-bit systems, for best overall performance.
end

def string(value)
value.to_slice.hash(self)
@siphash.update(value)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaking change: "some data".hash and "some data".to_slice.hash no longer have the same hash!

  • String#hash will compress 4-bytes 8-bytes at the time ("some dat", "a");
  • Slice#hash will iterate each byte, casted to UInt64, and finally compressed as 8 bytes ("s" => 115 => 0x0000_0000_0000_0073).

For performance of Hash(String, V) we definitely want a fast String#hash, but we also want 1_u8.hash == 1_u64.hash, but we introduce a difference between String#hash and Bytes#hash. Maybe it's fine, maybe it's not. I don't know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slice#hash will iterate each byte, casted to UInt64, and finally compressed (as 8 bytes).

Is there a performance penalty to do it byte by byte?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. SipHash will only compress 8 plain bytes at once (4 bytes with halfsiphash) —no more, no less, unless we finalize — if it's smaller or not a multiple of 8 bytes (or 4 bytes with halfsiphash) then we must buffer bytes until we have enough to compress —it adds to the overhead.

Furthermore, encoding "some data" will result in compressing "some dat" then "a" (thus call update once then finalize), whereas "some data".to_slice will call update 9 times (for each byte casted to UInt64).

Note that there is little we can do: Slice(T) may be any T, and we must properly call T#hash(hasher) for each entry.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite think this is a "real" breaking change, as "some data" != "some data".to_slice. People can't depend on the output of hash even after 1.0. They can only depend on the invariants around ==.

Copy link
Contributor

@bew bew Oct 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation

Note that there is little we can do: Slice(T) may be any T, and we must properly call T#hash(hasher) for each entry.

You're right, maybe we could (in an other PR?) add an optimization for some T types, like Bytes (Slice(UInt8)). And/Or more generally to any T that is a Number (and Bool)?

alias HashType = UInt64
{% else %}
alias SipHash = Digest::HalfSipHash
alias HashType = UInt32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m very unhappy to handle two different hash types on different architectures. Is it really unavoidable?

I should note that you must update StringPool accordingly in this case

@@ -178,6 +178,6 @@ class StringPool
hasher = Crystal::Hasher.new
hasher = str.to_slice(len).hash(hasher)
# hash should be non-zero, so `or` it with high bit
hasher.result | 0x8000000000000000_u64
hasher.result.to_u64 | 0x8000_0000_0000_0000_u64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use 32/64 bit arithmetic here or simply use 64 bit hashes everywhere

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly what I propose in the updated PR description.

I chose the easy solution here, but tweaking StringPool to take whatever is easy (except for high not or).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm strongly vote for UInt64 type. Just because I have unfinished work that hardly depends on it (number normalization).

@RX14
Copy link
Contributor

RX14 commented Oct 11, 2017

Perhaps the hash type should always be a UInt64, and we just cast to UInt64 on 32bit. I think changing Object#hash's type on 32bit is probably a bad idea.

@ysbaddaden
Copy link
Contributor Author

I modified halfsiphash to compute 64-bit hashes, which is slower (doubles finalizations), but still faster than siphash (on 32-bit systems). Some unscientific benchmarks (Benchmark.ips, encoding 2165 random english words):

  • 64-bit system (trusty host, llvm 5.0):
    Digest::SipHash(1, 3)  14.02k ( 71.34µs) (± 0.26%)       fastest
Digest::HalfSipHash(1, 3)   11.8k ( 84.77µs) (± 0.75%)  1.19× slower
  • 32-bit system (precise container, llvm 3.8):
    Digest::SipHash(1, 3)   8.46k (118.15µs) (± 0.44%)  1.43× slower
Digest::HalfSipHash(1, 3)   12.1k ( 82.61µs) (± 0.21%)       fastest

@ysbaddaden
Copy link
Contributor Author

CircleCI is broken: ld: library not found for -lgmp.

@akzhan
Copy link
Contributor

akzhan commented Oct 11, 2017

LGTM (but we need optimization for Slice(Primitive) anyway).

@ysbaddaden
Copy link
Contributor Author

Due to how Hasher#string was implemented (iterating each byte of the slice casted to u64), the performance impact of siphash (hashing bytes directly) is contained. Around 33.5µs (siphash) vs 32.5µs (current) on a 64-bit host.

So Crystal::Hasher#result always returns an UInt64 across systems.
Generating 8-bytes instead of 4-bytes hashes doubles the
finalization rounds but is still significantly faster in 32-bit
systems than Digest::SipHash.
@ysbaddaden
Copy link
Contributor Author

For some reason the last commit didn't appear on GitHub. I amended and force pushed it.

Copy link
Contributor

@akzhan akzhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a way to optimize slices. May be Hasher.slice(slice)...

@akzhan
Copy link
Contributor

akzhan commented Oct 13, 2017

Find the way, we should declare

struct Hasher
  def slice(value) : self
    @siphash.update(value)
    self
  end

  def string(value)
    value.to_slice.hash(self)
  end
end

struct Slice(T)
  def hash(hasher)
    hasher.slice(self)
  end
end

and revert Hasher.string to use to_slice.
What do you think, @ysbaddaden?

@ysbaddaden
Copy link
Contributor Author

ysbaddaden commented Oct 13, 2017

Tried already, but Int32.slice(1, 2, 3).hash will fail, since SipHash#update only accepts String and Bytes aka Slice(UInt8).

We could consider a Slice to be a binary blob, and hash it as is:

struct Slice(T)
  def hash(hasher)
    hasher.bytes Bytes.new(to_unsafe.as(UInt8*), size * sizeof(T))
  end
end

But I'm not sure; what if T is Time or LibC::SockaddrIn? Shall we iterate the slice and hash each item as item.hash(hasher)? We may only optimize basic primitives:

struct Slice(T)
  def hash(hasher)
    {% if [UInt8, UInt16, UInt32, UInt64, Int8, Int16, Int32, Int64, Bool, Float32, Float64].includes?(T) %}
      hasher.bytes Bytes.new(to_unsafe.as(UInt8*), size * sizeof(T))
    {% else %}
      super hasher
    {% end %}
  end
end

But maybe we shouldn't care: a slice is a memory blob of binary data, and that's it. We can't because T maybe anything, a primitive, a struct, or a Reference (in which case we should iterate).

@akzhan
Copy link
Contributor

akzhan commented Oct 13, 2017

I prefer to think about Slice as blob (may be with stricture on T.struct? || T.is_a?(Primitive)).

Just because hash author must know where he uses slice(value) method.

@ysbaddaden
Copy link
Contributor Author

ysbaddaden commented Oct 13, 2017

I got the wrong assumption that Slice(T) only accepted primitives and structs for T, but it accepts anything, so yeah, we should filter-out references, but there doesn't seem to be T.primitive? nor T.struct? in TypeNode macro methods.

@asterite
Copy link
Member

I now think that Slice's design is not good. Maybe we should only have Bytes which is a hardcoded version of Slice with UInt8. I don't think there are use cases for Slice where T isn't UInt8. So for now I wouldn't worry about optimizing this for other types of slices.

@RX14
Copy link
Contributor

RX14 commented Oct 13, 2017

I disagree. There's plenty of times where C structs and other int primitives make sense. Please keep Slice generally generic. I'd at least need to hear some concrete arguments against it first.

@ysbaddaden
Copy link
Contributor Author

Since Pointer(Struct) is possible I suppose say that Slice(Struct) makes sense too, if we consider slice to be safe pointers with bound checks.

Maybe aliasing Bytes as Slice(UInt8) is the issue? It's basically the same, but conceptually maybe there is a tiny difference: a chunk of bytes vs a collection of ordered T?

@RX14
Copy link
Contributor

RX14 commented Oct 14, 2017

I still don't see the problem, this is simply a performance enhancement right? The typical method is simply to iterate the slice and call hash(hasher), as with any other collection. We simply perform a performance optimization in the case of primitive numeric types.

@akzhan
Copy link
Contributor

akzhan commented Oct 14, 2017

Btw, is macro system allows to know type.has_inner_pointers?? It allows full optimization of hashing on any slices.

@RX14
Copy link
Contributor

RX14 commented Oct 14, 2017

@akzhan Why? Float doesn't have inner pointers, but it still must use hash(hasher) to ensure it's normalized.

@akzhan
Copy link
Contributor

akzhan commented Oct 14, 2017

@RX14 Slice on T without any references can be interpreted as raw byte array. It's usable for hashing too.

@RX14
Copy link
Contributor

RX14 commented Oct 14, 2017

No, Slice is a collection of items, not a raw byte array. Sometimes those items happen to be bytes.

We can perform optimizations in some cases where we can read the slice in blocks which suit siphash, but that's all it is: an optimization.

class Test
  def ==(other : Test)
    true
  end

  def hash(hasher)
    hasher.bool(true)
  end
end

Slice(Test).new(1, Test.new).hash == Slice(Test).new(1, Test.new).hash

This should always be true even if they're different pointers to different objects because we defined them to be the same using == and hash.

The simple optimization we would perform, that @ysbaddaden already wrote the code for (without float though), preserves this property (all equal ints have one unique bit representation on a given machine), so it's valid.

@akzhan
Copy link
Contributor

akzhan commented Oct 14, 2017

Ok, lets get it merged.

@akzhan
Copy link
Contributor

akzhan commented Oct 15, 2017

@asterite
Copy link
Member

I read more closely the issue. As I said, Slice(UInt8) is by far the most common type of slice. If we can optimize for that case then we should do it. Other slices can use a slower hash for now. That should also optimize String if we use str.to_slice.hash.

@RX14
Copy link
Contributor

RX14 commented Oct 15, 2017

I've just profiled the compiler compiling itself: a full 10% of compiler time is spent in Slice(UInt8)#hash. This method needs to be highly optimized.

@asterite
Copy link
Member

@RX14 What if you profile the compiler before we started making all these changes to hash?

@funny-falcon
Copy link
Contributor

In my version I made only specialization for Bytes aka Slice(Uint8) (and String#hash used it).

I don't see reason this shouldn't be done. At least until crystal has specialized primitive "string_slice".

BTW, don't forget about hashing of HTTP headers.


v1 ^= 0xdd

DROUNDS.times { sipround }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need in whole second finalization. Just increase DROUNDS by 1, and use v1 and v3 separately, without xor.

Given hash-value is not going to be exposed, there is no need in being such strong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I follow the official implementation for halfsiphash to generate an UInt64. I'm not competent enough to change the algorithm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, believe me at least once.

You've aleady mentioned that #hash should not be exposed. With this limitation my suggestion is absolutely safe. Just believe me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't do it, it's not safe. Just believe me.

@funny-falcon Actually, I do believe you. But if two people come with contradictory arguments and each say "Please believe me" there's no way to tell who is right, and why we should believe one or the other.

Can you provide a proof that doing that is safe?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asterite, its your language, do what you want.
I have no degree in cryptography, still I understand what I'm talking about.
You are just joking.
Nobody uses 32bit now, so this is irrelevant. Just add empty loop here, and no one will complain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, no one will complain if this function will be much slower than it needs to be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really have nothing against you! But I'm really incompetent to judge.

Those changes also mean that Digest::HalfSipHash would no longer be compliant to the specs, right? Since I want to expose it to developers, then the implementation should be compliant.

Since generating an UInt32 is okayish, maybe we can just return an UInt32 in Digest::HalfSipHash, and merely expand it to an UInt64 in Crystal::Hasher, so we don't have to deal with per-platform types, yet keep the algorithm compliant?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think yes. Hash is for example used in the compiler, and @RX14 benchmarked the compiler and with this change (I think) String#hash takes 10~15% of the time. So a fast hash function is important. But together with that we should probably optimize Hash entirely, avoid invoking #hash for small hashes (just do a linear lookup). And maybe even have a specialized Hash for the compiler where order of insertion doesn't matter.

There's just so much to do...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and merely expand it to an UInt64 in Crystal::Hasher

I think, it is good option. Simple multiplication by non-trivial odd number does the trick.

@ysbaddaden
Copy link
Contributor Author

I pushed a commit to optimize Bytes#hash and keep the iteration for other Slice(T), so for example:

Int32.slice(1, 2, 3).hash == Int64.slice(1, 2, 3)
Int32.slice(1, 2, 3).hash != UInt8.slice(1, 2, 3)

@akzhan
Copy link
Contributor

akzhan commented Oct 19, 2017

@ysbaddaden what about this trick:

struct Crystal::Hasher
  def slice(value) : self
    return self unless value.bytesize
    if value.first.is_a?(Int::Primitive) || value.first.is_a?(Float::Primitive)

p "fast"
      p64 = value.to_unsafe.as(UInt64*)
      size64 = value.bytesize / sizeof(UInt64)
      size64.times do |i|
        @result = @result * 31 + p64[i]
      end
      full_len = size64 * 4
      remainder = value.bytesize & 0x03
      remainder.times do |r|
        @result = @result * 31 + value.to_unsafe.as(UInt8*)[full_len + r]
      end
      self
    else
      self
    end
  end
end

p Crystal::Hasher.new.slice(Slice(UInt8).new(4)).result

Looks like code piece that checks Slice T should be rewritten.

@asterite
Copy link
Member

Wow. I have this benchmark:

words = ["one", "two", "hello world", "welcome", "good bye", "さようなら"]

hash = {} of String => Int32
words.each do |word|
  hash[word] = word.bytesize
end

time = Time.now
a = 0
10_000_000.times do
  words.each do |word|
    a += hash[word]
  end
end
puts Time.now - time

In Crystal 0.23.1 it takes 0.95s. In master (because we now randomize hashes) it takes 1.69s. And with this PR it takes 4.27s!

But then, the same benchmark in Ruby takes 12.98s, so it might be ok. However, in Go it takes 0.76s, so I guess we better optimize our Hash and #hash algorithms. I checked and Go doesn't seem to use siphash.

@akzhan
Copy link
Contributor

akzhan commented Oct 19, 2017 via email

@funny-falcon
Copy link
Contributor

On my laptop:

yura@dell:~/Project/crystal$ ~/tmp/bench_master
00:00:01.582613000
yura@dell:~/Project/crystal$ ~/tmp/bench_master
00:00:01.586930000
yura@dell:~/Project/crystal$ ~/tmp/bench_master
00:00:01.605560000
yura@dell:~/Project/crystal$ ~/tmp/bench_mine
00:00:00.9957240
yura@dell:~/Project/crystal$ ~/tmp/bench_mine
00:00:01.0005630
yura@dell:~/Project/crystal$ ~/tmp/bench_mine
00:00:01.0528370

bench_mine - is a hasher1 branch

@funny-falcon
Copy link
Contributor

yura@dell:~/Project/crystal$ ~/tmp/bench_minehash                                                                                              
00:00:02.1886980
yura@dell:~/Project/crystal$ ~/tmp/bench_minehash
00:00:02.1599750
yura@dell:~/Project/crystal$ ~/tmp/bench_minehash
00:00:02.2136030

bench_minehash version with my hash function.
(bench_mine used just my hasher with almost same hash function as master).

All is compiled with --release

@funny-falcon
Copy link
Contributor

yura@dell:~/Project/crystal$ ~/tmp/bench_minehash_old
00:00:01.0748730
yura@dell:~/Project/crystal$ ~/tmp/bench_minehash_old
00:00:01.0996810
yura@dell:~/Project/crystal$ ~/tmp/bench_minehash_old
00:00:01.1796260

With hash function I initially crafted for Crystal.

@ysbaddaden
Copy link
Contributor Author

ysbaddaden commented Oct 19, 2017

Here are my results (Ubuntu 14.04, 64-bit), using benchmark.ips and 2165 random english words. Master is slightly slower than 0.23.1, while this branch is twice slower:

test  10.12k (  98.8µs) (± 0.44%) fastest           [0.23.1]
test   9.13k (109.57µs) (± 0.28%) fastest           [master]
test    5.8k (172.48µs) (± 1.39%) fastest           [siphash]

Note: we can most surely tweak / rework the algorithm to enable some LLVM optimizations.

@funny-falcon
Copy link
Contributor

I checked and Go doesn't seem to use siphash.

Go uses as fast function as its author could made.
It is not crypto-strong even nearly (despite it uses aes-inc primitive if present).
But it is resistant to hash-flood given hash-value is not exposed.
(And Go's runtime never exposes it).
And they use per-table random seed combined with application level random seed.

So, it is your choice what to use:

  • buzz-word crypto-strong but slow hash function,
  • or just enough strong fast function.

@asterite
Copy link
Member

I think I prefer strong enough fast function :-)

@akzhan
Copy link
Contributor

akzhan commented Oct 19, 2017

@funny-falcon can you send PR with your fastest implementation?

I'm always can get hasher1, of course.

With no normalization please

@funny-falcon
Copy link
Contributor

hasher1 is almost the same as current hasher, so it is as dumb.
I will prepare version with funny_hash64.

@akzhan
Copy link
Contributor

akzhan commented Oct 19, 2017

@asterite @funny-falcon I think that we

  1. must incorporate fastest HashDoS defence algorithm from @funny-falcon
  2. give SipHash digest algorithms as option (not used by Crystal)
  3. rework hashes using open addressing, as initially designed.
  4. introduce Python-inspired number normalization (optional, on the way).

Every step is by another PR.

@funny-falcon
Copy link
Contributor

funny-falcon commented Oct 19, 2017

I've made #5146 with funny_hash64 like function.
Surpisingle, I could not make faster than current dumb hasher for the benchmark above.
But it is not slower either. Looks like, with relatively fast function bottleneck is somewhere else.

Fastest result above were achieved on branch from older master commit with my hasher interface and 2*32bit state.
I don't know which factor is most significiant: older master commit, my hasher interface or 2*32bit state :-(
And my wife will bite me cause I'm still not at home. So, I'm done for today.

@akzhan
Copy link
Contributor

akzhan commented Oct 30, 2017

@ysbaddaden looks this PR should be modified to no touch Crystal::Hasher directly. Just alternative digests.

@asterite
Copy link
Member

Now that the alternative hash function has been merged, I don't think there's a need for this in the standard library (it could be a useful shard, though).

@RX14 RX14 closed this Oct 30, 2017
@ysbaddaden
Copy link
Contributor Author

It was already: https://github.com/ysbaddaden/siphash.cr

@ysbaddaden ysbaddaden deleted the std-siphash branch June 27, 2018 08:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants