-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
change computation of hash value. #4621
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a decent idea in general but the implementation is very weak. You should also modify def_hash
src/object.cr
Outdated
end | ||
|
||
# Protocol method for safe hashing | ||
abstract def hashme(hasher) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be renamed to just hash(hasher)
and let method overloading do the work. The precedent is in how to_s
and to_json
work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RX14 , how could I distinguish def hash(hasher)
from def hash
with responds_to?(:hash)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@funny-falcon just implement def hash(hasher) in base classes to not use responds_to?
.
src/object.cr
Outdated
@@ -58,13 +58,19 @@ class Object | |||
nil | |||
end | |||
|
|||
# Generates an `Int` hash value for this object. | |||
# Generates an `Int` hash value for this object (usually `UInt32`) | |||
# | |||
# This method must have the property that `a == b` implies `a.hash == b.hash`. | |||
# | |||
# The hash value is used along with `==` by the `Hash` class to determine if two objects | |||
# reference the same hash key. | |||
abstract def hash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this line, it's redundant because of the method definition below.
elsif self == to_u64 | ||
to_u64 | ||
else | ||
(self % 0x7fffffffffffffe7_i64).to_i64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this constant deserves a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is prime number.
src/big/big_int.cr
Outdated
@@ -267,8 +267,22 @@ struct BigInt < Int | |||
to_s io | |||
end | |||
|
|||
private def hash_normalize | |||
if self == to_i64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These first two conditions are weird to me, can you explain what's going on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment to Object#hash
:
This method must have the property that
a == b
impliesa.hash == b.hash
$ bin/crystal eval 'require "big"; b=BigInt.new(-1000); p b.to_i64, b.to_u64'
Using compiled compiler at `.build/crystal'
-1000
1000
So I can't use just to_u64
. Nor I can just use to_i64
cause of numbers between INT64MAX and UINT64_MAX. Although, probably I can use self & 0xffffffffffffffff_u64
, but it will be tweaky.
src/enum.cr
Outdated
@@ -275,8 +275,16 @@ struct Enum | |||
end | |||
|
|||
# Returns a hash value. This is the hash of the underlying value. | |||
# Enum hash is exception from random seeding, cause several Hashes are | |||
# initialized on very early stages of bootstrap, and StdHasher seed is not | |||
# filled yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might be able to work around this by adjusting the require order in the prelude.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I'm not too convenient in Crystal.
Issues are cause of:
enum Signal
used insrc/event/signal_handler.cr
enum LLVM::Attribute
defined insrc/llvm/events/cr
(don't know, where it is used)class Thread
is also used very early insrc/thread.cr
:@@threads = Set(Thread).new
If you point me how to reorder prelude to fill StdHasher
@@seed
before this hashes constructed, it will be great.
It could be fixed also with defining special-cased Hash, if def hashkey(key); key.hash; end
will be defined, and then overloaded in subclass specially for this use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is src/prelude.cr
, but I doesn't check for.
src/http/headers.cr
Outdated
c = normalize_byte(c) | ||
h = 31 * h + c | ||
v = (v<<8) | normalize_byte(c).to_u32 | ||
if v >= 0x1000000_u32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also has no comment or explanation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it just "take four chars in a row". I could use counter here. Just was lasy to introduce another one variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Write a comment, please :)
src/stdhasher.cr
Outdated
struct StdHasher | ||
@@seed = StaticArray(UInt32, 7).new{|i| 0_u32} | ||
def self.init | ||
raise "#{@@seed[0]} #{@@seed[1]}" unless @@seed[0] == 0_u32 && @@seed[1] == 0_u32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just sanity check remained from debugging. I'll remove it.
src/stdhasher.cr
Outdated
# Also it has specialized methods for primitive keys with different seeds. | ||
struct StdHasher | ||
@@seed = StaticArray(UInt32, 7).new{|i| 0_u32} | ||
def self.init |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just make this part of the @@seed
initializer and avoid having an init method? Do you see any other init methods called from the prelude?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works with just filling in StdHasher body. init could be removed.
end | ||
|
||
def initialize | ||
@h = Pointer(Impl).malloc(1, Impl.new(@@seed[0], @@seed[1])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly what's the point of this it just seems to me you've replicated a class
exactly by using 2 structs...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not!!!! See StdHasher.build
method: it can allocate Impl on a stack. There is no way to allocate class
on a stack. Compiler is NOT smart enough!!!
If you want, I may remove StdHasher#initalize()
at all, so there will be no malloc
in this file.
I don't use it. I put it just for completeness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need pointers (nor allocate HEAP memory). As long as you access @h
(not through an accessor) you can mutate @h
properties directly:
struct Hasher
struct Impl
property v : Int32
def initialize(@v)
end
end
def initialize(seed)
@h = Impl.new(seed)
end
def <<(v)
@h.v += v
end
def digest
@h.v
end
end
hasher = Hasher.new(0)
hasher << 1
hasher << 4
p hasher.digest # => 5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initializers could then be reduced to:
def self.new
new(@@seed[0], @@seed[1])
end
protected def initialize(a, b)
@h = Impl.new(a, b)
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Method chaining is nice sugar but not really necessary. Your suggestions for alternate names would all be without chaining as well. I don't see that <<
would imply chainability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway class/struct creators will have knowledge to return Hasher
in hash, for example.
Moreover it's just hasher << whaaaaat
pattern.
And compiler will help with type stricture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, @RX14 already said about def_hash
, looks like 80+% authors just will use this macro.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akzhan, cause it is easy to break with chaining:
# good
def hash(hasher)
hasher << @a << @b
end
# wrong, but compiles without error
def hash(hasher)
hasher << @a << @b
hasher << @c
end
# corrected
def hash(hasher)
hasher = hasher << @a << @b
hasher << @c
end
So, resolution:
- default hasher is a struct without pointers (for safety),
- it causes that
def hasher
have to return hasher, <<
is used, but without chaining (to reduce usage errors),
Is it correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
src/stdhasher.cr
Outdated
self | ||
end | ||
|
||
def <<(v : Int8|Int16|Int32|UInt8|UInt16|UInt32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use crystal tool format
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, use bin/crystal tool format
, it's changed in master branch.
why hashme? but not |
@kostya , how could I distinguish |
is responds_to? needed?, may be implement default hash it on Object? |
src/big/big_rational.cr
Outdated
@@ -143,6 +143,10 @@ struct BigRational < Number | |||
to_f64.hash | |||
end | |||
|
|||
def hashme(h) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hasher
should better be spelled out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok (and for othet same notes).
src/big/big_int.cr
Outdated
hash_normalize.hash | ||
end | ||
|
||
def hashme(h) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hasher
should better be spelled out.
src/big/big_float.cr
Outdated
@@ -39,6 +39,10 @@ struct BigFloat < Float | |||
to_f64.hash | |||
end | |||
|
|||
def hashme(h) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hasher
should better be spelled out.
src/float.cr
Outdated
hash_normalize.hash | ||
end | ||
|
||
def hashme(h) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hasher
should better be spelled out.
src/float.cr
Outdated
hash_normalize.hash | ||
end | ||
|
||
def hashme(h) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hasher
should better be spelled out.
src/http/headers.cr
Outdated
StdHasher.hashit self | ||
end | ||
|
||
def hashme(h) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hasher
should better be spelled out.
src/indexable.cr
Outdated
def hash | ||
reduce(31 * size) do |memo, elem| | ||
31 * memo + elem.hash | ||
def hashme(h) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hasher
should better be spelled out.
src/xml/namespace.cr
Outdated
@@ -4,8 +4,8 @@ struct XML::Namespace | |||
def initialize(@document : Node, @ns : LibXML::NS*) | |||
end | |||
|
|||
def hash | |||
object_id | |||
def hashme(hasher) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this perhaps be generalized? Like if a type has method object_id
, use this for calculating the hash, whether it is a reference or not.
src/float.cr
Outdated
@@ -148,8 +148,17 @@ struct Float32 | |||
Printer.print(self, io) | |||
end | |||
|
|||
private def hash_normalize | |||
t = to_i32 | |||
t == self ? t : unsafe_as(Int32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is bad idea, but I know, that current implementation is also weird.
Will be replaced with #4581 I hope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is neccessary cause of definition of hash
function: "if a==b then hash(a)==hash(b)".
Without it Hash(Int32|Float32, V)
is broken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does #4581 fix hashing of Float32|Float64
? If so, then it could be better to apply.
Do you want I include it to my patch, or they will be applied in some order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@funny-falcon Float hashes in pyhash
manner are identical to integer values if no fraction part exists. Feel free to include this, of course.
p [1, 2, 3, 4, 1.0, 2.0_f64, 3.0_f32, 4.0].map &.hash
[1, 2, 3, 4, 1, 2, 3, 4]
p [1, 2, 3, 4, 1.0, 2.0_f64, 3.0_f32, 4.1].map &.hash
[1, 2, 3, 4, 1, 2, 3, 1932733648]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh no, there is one exception - -1 value. Will fix now. There is one line fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean more 4.25_f64.hash == 4.25_f32
? If not, then my variant is simpler. If yes, then it is better to use more correct variant.
(note: I believe 4.1_f64 != 4.1_f32
in any case, but I could be mistaken).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0.5_f32.hash == 0.5_f64.hash # => true
0.75_f32.hash == 0.75_f64.hash # => true
4.1_f32.hash == 4.1_f64.hash # => false - because values differ.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Ok, I will include that function in next version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be strict:
puts "%.20f" % 4.1
puts "%.20f" % 4.1_f32
4.09999999999999964473
4.09999990463256835938
src/hash.cr
Outdated
hash = size | ||
def hashme(hasher) | ||
hasher << size | ||
dgst = hasher.digest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need no shortcuts. digest
sounds better than dgst
.
@@ -262,11 +262,6 @@ struct JSON::Any | |||
end | |||
|
|||
# :nodoc: | |||
def hash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why it is removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beause Struct
already provides correct hash(hasher)
.
src/int.cr
Outdated
@@ -317,7 +317,15 @@ struct Int | |||
end | |||
|
|||
def hash | |||
self | |||
if self.is_a? Primitive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which case could Int
not be a primitive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BigInt.
@@ -0,0 +1,248 @@ | |||
require "secure_random" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SecureRandom is stdlib, not corelib, it shouldn't be injected into prelude (indireclty).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But without "secure_random" no safety for hash table.
What do you suggest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Crystal::System::Random
directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ysbaddaden, big thanks!!!! I'll do.
btw, it will be ok if I'll push with force this PR? I prefer to keep history clean. |
Appending of commits is OK. Result will be squashed and reworded anyway. |
src/stdhasher.cr
Outdated
end | ||
|
||
def <<(v : UInt64) | ||
high = (v>>32).to_u32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, Crystal@master has new value.unsafe_as(UInt32)
method. AFAIR it's faster for these cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it? Here is integer to integer conversion. Isn't it compiler primitive already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, No.
Crystal compiler is lower than 1.0, so lot of optimizations doesn't exist yet.
You have i64.i32 conversion here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wrong :) Just found these primitives :)
src/stdhasher.cr
Outdated
if high != 0 | ||
self << high | ||
end | ||
self << v.to_u32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this code isn't compatible with #4581. There is 31/62bit divider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have commented Hashing::Type
but you may introduce it as alias StdHasher::Type = Int32
.
So please update it, you need these lines for 64bit integers.
private HASH_BITS = 31 # sizeof(StdHasher::Type) >= 8 ? 61 : 31
exp = exp >= 0 ? exp % HASH_BITS : HASH_BITS - 1 - ((-1 - exp) % HASH_BITS)
x = ((x << exp) & HASH_MODULUS) | x >> (HASH_BITS - exp)
I suppose that StdHasher should have something like StdHasher::ValueType alias to declare hash value type. |
To protect against Hash DoS, change the way hash value is computed. Class|Struct should define method `def hashme(hasher)` and call `hasher << @ivar` inside. As an option, for speed, and for backward compatibility, `def hash` still could be implemented. It will be used for Hash of matched type. `Thread#hash` and `Enum#hash` is implemented as unseeded cause they are used before `StdHasher @@seed` is initialized. But it is better to implement `def hashme(hasher)`. StdHasher is default hasher that uses `hashme` and it is used as default seeded hasher. It also implements `fasthash` used for fast hash of primitive types, and `unseeded` for `Enums`. Fixes crystal-lang#4578 Prerequisite for crystal-lang#4557
though Enum#hash could be re-introduced later for performance.
doesn't include crystal-lang#4581 yet (number normalization)
I pushed intermediate changes. |
bytesize = name.bytesize | ||
slice = name.to_slice | ||
hasher.raw(bytesize) | ||
while c + 4 < bytesize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use idiomatic crystal (no impact on performance):
slice = name.to_slice
0.step(to: slice.size, by: 4) do |c|
# ...
end
slice.each do |c|
# ...
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slice looks to allocate Array. Am I mistaken?
Computing hash should not do heap allocation.
Where is step
defined? I didn't find it in 'int.cr' (while upto
were there) and gave up.
Ah, I see, it is in number.cr
. My bad. I'll fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, Slice is (Pointer+Size) struct, value type. no heap allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, a slice is basically a pointer with a size (so accesses are bounded). No more, no less. Slices can allocate HEAP memory with Slice(T).new(size)
or bounded to view any memory (HEAP or stack) with Slice(T).new(pointer, size)
, which is the case here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry, i were asleep a bit. I thought about each_slice
. Btw, can Slice have overloaded each_slice?
a = rotl(a, 13) | ||
b ^= a + s | ||
b *= 9 | ||
aa.value, bb.value = a, b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is idiomatic C, not Crystal. Instead of manually assigning to locals then reassign to pointers, pass by value directly:
def permute(v, s, a, b)
{a, b}
end
a, b = permute(v, s, a, b)
Or make it a macro, and assume s
, a
and b
variables:
private macro permute(v)
end
Anyway, LLVM will inline the permute
call anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scratch that, why even pass a
and b
instead of using @a
and @b
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use local variables instead of instance variables to reduce redundant stores to instance variables. I looked to assembly and looks like Crystal/LLVM is not aggressive enought in removing this stores. Especially it is visible in a loop of Bytes permutation.
I use pointers, cause LLVM does bad when to 32bit variables are packed into struct/tuple - it combines them into 64bit argument/result, and function inlining is not able to unsplit them. It is better said, LLVM lacks 'local var struct decomposition' (i don't remember exact name) ie when struct is function local (maybe after inlining), its fields become independent variables. GCC has such optimisation.
Macro is a great advise! Thanks! I'll try it.
current_thread = @@threads.each_key do |thread| | ||
if LibC.pthread_equal(thread.id, current_thread_id) != 0 | ||
break thread | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Why use a custom
Hash
instead ofSet
? - Why drop idiomatic
find
for a custom iteration/break?
It's faster isn't an acceptable response: stdlib should be able to use idiomatic Crystal with no impact on performance, (otherwise it means apps should do the same: custom over idiomatic). Here we want a Set and Set#find
not a custom value-less Hash and custom loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to overload Hash#hash_key
, cause this file runs before StdHasher @@seed
initialized. In previous version I overloaded Thread#hash(hasher)
to use unseeded permutation.
Which way is better? I'll bind to whatever you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I don't mind return unseeded Enum#hash
. Doubtfully it could be attack vector. Then I could revert Hash subclass in signal_handler either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overloading #hash(hasher)
is simpler and avoids additional types. If it achieves the same, let's go for it. I doubt it could be an attack vector for Thread
or Signal
, and they're only supposed to be used internally.
Note: did you really mean Enum#hash
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I mean Enum#hash
, but I can go with Signal#hash
instead. Whatever you prefer.
end | ||
|
||
def digest(seed) | ||
a, b = @a, @b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need local values, use @a
and @b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want digest
to not mutate hasher. So, that is not optimisation, but semantic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I misread, and thought they were reassigned later. My bad.
u = buf.to_unsafe | ||
a, b = @a, @b | ||
i = bsz.unsafe_div 4 | ||
while i > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, idiomatic Crystal:
i.downto(1) do
# ...
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll try it.
bsz = buf.size | ||
v = bsz.to_u32 << 24 | ||
u = buf.to_unsafe | ||
a, b = @a, @b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need local values, use @a
and @b
@ysbaddaden it's here where a & b are reassigned at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, but I looked into generated assembly, and I'm sure enough local variables are needed in Bytes permutation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's here where a & b are reassigned at the end.
you looked to different function.
@@ -73,12 +73,12 @@ struct Struct | |||
# Returns a hash value based on this struct's instance variables hash values. | |||
# | |||
# See also: `Object#hash` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old documentation of hash
is detached from anything, you should remove it if you don't need it.
@@ -18,8 +18,10 @@ struct Symbol | |||
# Generates an `Int32` hash value for this symbol. | |||
# | |||
# See also: `Object#hash`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, old documentation is detached.
def hash | ||
object_id | ||
def hash(hasher) | ||
hasher << object_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you use hasher << object_id
where in XML::Node#hash
and XML::NodeSet#hash
you use hasher.raw object_id
, why is it different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ops, missed it when changed there.
it doesn't affect correctness, but for performance it is better to change to raw
.
thanks.
@funny-falcon Looks like |
@akzhan, I don't agree hasher is superfluous. I think it is not pretty but important. Note: current "h = h*33 + next.hash" is also kind of hasher, but very simple and unsafe. Hasher is also an optimization:
Most hard thing is handling of |
Concerning optimization: on the other hand, hasher introduces dependency between elements, and there could be fast function that is "only finalizer", and it could be calculated using instruction parallelism. But I like hasher idea, and will try to finish this PR in that way. |
Ok, got it :) |
It's gr8 there's so much technical discussion about the topic, but merit aside, it became IMO already way too confusing with comments scattered all around the code, multiple force-pushes, word abbreviations and one letter variable names. I'd suggest opening a new PR with cleaned up implementation and API based on the comments above, with one letter variable names changed to something readable without digging into the code and abbreviations ( |
If |
To protect against Hash DoS, change the way hash value is computed.
Class|Struct should define method
def hashme(hasher)
and callhasher << @ivar
inside.As an option, for speed, and for backward compatibility,
def hash
still could be implemented. It will be used for Hash of matched type.
Thread#hash
andEnum#hash
is implemented as unseeded cause they areused before
StdHasher @@seed
is initialized.But it is better to implement
def hashme(hasher)
.StdHasher is default hasher that uses
hashme
and it is used as defaultseeded hasher. It also implements
fasthash
used for fast hash ofprimitive types, and
unseeded
forEnums
.Fixes #4578
Prerequisite for #4557