Change Hash implementation #5256

funny-falcon · 2017-11-07T21:27:22Z

use array of entries + open addressing index of entries.

RX14

Some name changes, and a explanitory comment on how it works would be invaluable to reviewing this.

RX14 · 2017-11-07T21:44:56Z

src/hash.cr

-  @buckets_size : Int32
-  @first : Entry(K, V)?
-  @last : Entry(K, V)?
+  @sz : UInt8


This is badly named.

RX14 · 2017-11-07T21:47:10Z

src/hash.cr

  end

  # Similar to `#first_key?`, but returns its value.
  def first_value?
-    @first.try &.value
+    unless empty?


return nil if empty?/return yield if empty? is more readable and ditto down the page.

RX14 · 2017-11-07T21:49:24Z

src/hash.cr

-  protected def find_entry(key)
-    return nil if empty?
+  @[AlwaysInline]
+  private def iter_entries


each_entry?

RX14 · 2017-11-07T21:49:54Z

src/hash.cr

+    @sz = newsz
+  end
+
+  private def need_shrink(size : Int32, sz : UInt8) : Bool


needs_shrink??

RX14 · 2017-11-07T21:50:57Z

src/hash.cr

+
+  private def nindex(sz)
+    mask = SIZES[sz].indexmask
+    mask + (mask != 0 ? 1 : 0)


mask == 0 ? 0 : 1

RX14 · 2017-11-07T21:51:12Z

src/hash.cr

+    (h >> 1) + 1
+  end
+
+  private def nindex(sz)


This isn't a very crystally name.

funny-falcon · 2017-11-07T22:05:21Z

Worse problem: doesn't look it is faster :-(
While some microbenchmarks I've made shows it is a bit faster, http helloworld is slower, and std_spec is also a bit slower.

And 32bit compiler test didn't pass in Travis-CI.

funny-falcon · 2017-11-07T22:32:42Z

Looks like, reallocation costs too much.
Probably could be better if entries are allocated by chunks instead of by huge array.
I'll close this PR now until improving.

akzhan · 2017-11-08T04:31:16Z

I think that Hash may have two different implementations for small (<7) and other sizes.

I'm sure that small hashes have a big value.

asterite · 2017-11-09T11:30:48Z

I benchmarked this PR and it works twice as fast as the current Hash. Here's my benchmark:

a = {} of Int32 => String

strings = Array.new(10_000_000, &.to_s)

time = Time.now
strings.each_with_index do |s, i|
  a[i] = s
end
puts Time.now - time

Remember to run this with bin/crystal in master and bin/crystal with your branch, so we are sure both use funny hash.

It would be interesting to see your benchmarks.

I think this Hash should be more efficient because in the current Hash every time you add a key-value pair there's an allocation for the node. In your implementation that doesn't seem to be the case.

funny-falcon · 2017-11-09T13:07:55Z

It is expected for single huge hash to be faster.

But http helloworld ( from crystal-lang.org main page) prouces many small hashes.

RX14 · 2017-11-09T15:04:40Z

Testing compiler compile performance (with --no-codegen) before and after is likely to be a good indicator of whether this hash is better just in theory or in practice. Microbenchmarks are only so useful.

I don't like the idea of having a rigid cutoff point with 2 different hash implementations. There must be a way to make it fast at any size.

Another optimization I would love to make is having only 1 malloc call for am empty hash, as at least in the rust compiler I know they found that hashtables being created, never used, but discarded was fairly common.

funny-falcon · 2017-11-09T15:39:42Z

I have an idea to use chunked allocaion, ie allocate by chunks of 8 entries.
Probably it will be faster than current hash implementation, and not too slower than this PR on huge hashes. I will investigate it.

RX14 · 2017-11-09T16:12:06Z

@funny-falcon Assuming that a larger chunk size would increase performance, you could change chunk size based on the size of the hash to keep the memory overhead bounded when small and performance overhead miniscule when large.

akzhan · 2017-11-10T15:24:26Z

What about implementation of pessimistic downsize politics?

funny-falcon · 2017-11-10T17:12:09Z

@akzhan , what do you mean?

akzhan · 2017-11-10T19:14:46Z

Do relocations on increase, but rarely on decrease. пт, 10 нояб. 2017 г. в 20:12, Sokolov Yura <notifications@github.com>:

…

@akzhan <https://github.com/akzhan> , what do you mean? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#5256 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAG3hFfU_zDu06-XLl6-ikZ99Y3YL3Unks5s1IPrgaJpZM4QVeWz> .

funny-falcon · 2017-11-11T06:47:42Z

That was already in this patch: it resized down only if size < capacity / 4

funny-falcon · 2017-11-12T19:38:12Z

Looks like chunked version is not slower than current implementation even on http hello world, so that I reopen pull request.

RX14 · 2017-11-12T22:57:47Z

@funny-falcon Please give us some documentation on how this works to work with!

Sija · 2017-11-12T20:58:39Z

src/hash.cr

+    @rebuild_num = 0_u16
+    @first = 0_u32
+    @last = 0_u32
+    @index = Pointer(UInt32).new(0)


I'd suggest using Pointer.empty.

You mean Pointer.null.

@RX14 right, my bad.

Sija · 2017-11-12T21:00:12Z

src/hash.cr

    @block = block
+    unless initial_capacity.nil?


if initial_capacity

it may be zero, isn't it?

only nil, Pointer.null and false are falsey in crystal. 0 is truthy, which means the only practical type where you need to use Object#nil? is Bool?, which is fairly rare and a situation often better described with an enum.

yxhuvud · 2017-11-13T09:38:27Z

src/hash.cr

+    chunks = chunks_ptr
+    @first.upto(@last - 1) do |i|
+      chunk = chunks[i / CHUNK]
+      entry = chunk + i % CHUNK


I'm not certain there is a noticeable difference or if llvm is smart enough to notice this by itself, but perhaps

chunk_index, offset = i.divmod(CHUNK)

to save a division.

llvm should optimize division and module by 8 to shifts and mask (ie i >> 3 and i & 7)

akzhan · 2017-11-14T20:38:24Z

Is it GTG?

funny-falcon · 2017-11-14T21:21:08Z

It plays bad with conservative GC on 32 bit platform:

stored hashsum could be counted as pointers (failed weakptr test on travis ci). That is why in first version I stored hashsums in separate array (GC then knows to not look inside).
index array is five-ten times greater than bins in current implementation. Looks like GC doesn't like such big arrays.

So, while small scripts will still work, larger applications become less reliable on 32bit platform with this patch.

So, if support of 32 bit platform is important, this patch should be improved. I don't know, how it should be done. I'm still thinking.

funny-falcon · 2017-11-14T21:23:00Z

Doubtfuly Crystal will gain precise GC soon.

akzhan · 2017-11-14T21:34:00Z

Have you seen #5271?

pnloyd · 2017-11-15T00:03:21Z

@akzhan, not totally clear if anyone can or wants to push that forward ATM though.

RX14 · 2017-11-17T16:41:14Z

I wouldn't focus on the performance of 32-bit too much, but it should work and be stable. 32bit is almost obsolete on x86, but on arm it'll still be a while until 32bit SoCs stop being sold.

funny-falcon · 2017-11-17T17:27:52Z

32bit version works. It just doesn't play well with conservative GC.

I may make 32bit version more "workable" by not storing hash-sum, and changing it to closed addressing/chained version. I think, a lot of code will be common between 32bit and 64bit versions, so they will co-exists in one file with several if flag?(:32bit).

RX14 · 2017-11-18T18:00:08Z

Perhaps seperating the "implementation-specific" and "helper-methods" parts of hash.cr could keep it more organised. Keep the implementation-specific methods which are changed by this PR at the bottom of the file, and everything else above the file. We can then have a big comment which explains the implementation halfway down the file before all the implementation.

akzhan · 2017-11-24T17:59:51Z

This Pull request is alone that not yet finished anf merged with inspiration of #4675, what is it's state now?

funny-falcon · 2017-11-24T18:56:30Z

I was a bit busy with work and family. I plan to spend some time on weekends.

To work around 32bit platform's issues, use chaining instead of open addressing.

funny-falcon · 2017-11-26T04:22:29Z

damn :-( now 64bit version fails with "Repeated allocation of very large block"... I can't understand, why ;-(

funny-falcon · 2017-11-26T04:26:40Z

And I can't reproduce on my notebook :-( My version of libgc doesn't complain against. As well as circleci.

akzhan · 2017-11-26T11:07:11Z

Can you ask @ivmai about strange "Repeated allocation of very large block"?

yxhuvud · 2017-11-26T11:58:04Z

I thought the repeated allocation was just a warning. I get that in my implementation too.

EDIT: Or well, it is not really 'just' a warning as stuff like this shouldn't warn, but not an error.

RX14 · 2017-11-26T14:53:44Z

No, the error is that the compiler specs ran out of memory. Check the usage before and after locally.

funny-falcon · 2017-11-26T15:21:04Z

Problem is I cann't reproduce. And circleci cann't as well. So, looks like it depends on hosts libgc.

I have one suggestion: probably it is because old code doesn't reallock huge bins array on #clean . But to check I need refactor code.

funny-falcon · 2017-11-26T17:58:30Z

I turned off deallocation on clear, but looks like it doesn't help.
I don't know what to do further :-(

RX14 · 2017-11-26T18:04:56Z

@funny-falcon it's unlikely to be a libgc bug. You probably can't reproduce it simply because you have more ram than the travis VM. Look at your ram usage when running make compiler_spec before and after this PR.

funny-falcon · 2017-11-26T18:17:46Z

@RX14, I don't claim it is libgc bug. I've said, libgc behaves differently between travis-ci and my computer. Unfortunately, on my pc I have no this warnings. So it complicates investigation a lot.

funny-falcon · 2017-11-26T18:30:43Z

On ~5minute point (~610 samples) master consumed 490MB and patched 390MB, therefore it is not direct memory consumption, unfortunately.

RX14 · 2017-11-26T19:26:08Z

@funny-falcon that's very strange.

vlazar · 2018-06-16T16:47:35Z

@funny-falcon Do you think it would make a sense to try with latest Crystal release? It's been a while, maybe the issue disappeared? :)

funny-falcon · 2018-06-16T18:06:12Z

Yes, I will try

RX14 · 2018-06-21T19:30:01Z

Yeah i'm quite sad this PR stalled, Hash is one of the most-used datastructures in the compiler and many other crystal programs. And it's probably the datastructure with most scope for performance changes in it's implementation. I'd love to see crystal have a well-optimized Hash, and see what performance effects it has, especially if it has any effect on compile times.

vlazar · 2019-10-12T06:03:07Z

With #8017 merged in should this one be closed?

ysbaddaden · 2019-10-12T06:15:48Z

Yes, closing.

Change Hash implementation

5ad1b90

use array of entries + open addressing index of entries.

funny-falcon mentioned this pull request Nov 7, 2017

Reimplementation a Hash using open addressing #4557

Closed

RX14 reviewed Nov 7, 2017

View reviewed changes

funny-falcon closed this Nov 7, 2017

funny-falcon added 2 commits November 12, 2017 12:49

consider naming review

a9d8396

Chunked Hash implementation.

6295806

funny-falcon reopened this Nov 12, 2017

crystal tool format

0529853

Sija reviewed Nov 13, 2017

View reviewed changes

yxhuvud reviewed Nov 13, 2017

View reviewed changes

closed addressing

b7641b0

To work around 32bit platform's issues, use chaining instead of open addressing.

funny-falcon force-pushed the openaddressing branch from aa84284 to b7641b0 Compare November 26, 2017 03:20

...

5c2acdf

... some meaningless changes

bf42892

experiment: do not resize down hash

60efb4d

ysbaddaden closed this Oct 12, 2019

Change Hash implementation #5256

Change Hash implementation #5256

Conversation

funny-falcon commented Nov 7, 2017

RX14 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

funny-falcon commented Nov 7, 2017

funny-falcon commented Nov 7, 2017

akzhan commented Nov 8, 2017 • edited Loading

asterite commented Nov 9, 2017

funny-falcon commented Nov 9, 2017

RX14 commented Nov 9, 2017 • edited Loading

funny-falcon commented Nov 9, 2017

RX14 commented Nov 9, 2017 • edited Loading

akzhan commented Nov 10, 2017

funny-falcon commented Nov 10, 2017

akzhan commented Nov 10, 2017 via email

funny-falcon commented Nov 11, 2017

funny-falcon commented Nov 12, 2017

RX14 commented Nov 12, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RX14 Nov 14, 2017 • edited Loading

Choose a reason for hiding this comment

yxhuvud Nov 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akzhan commented Nov 14, 2017

funny-falcon commented Nov 14, 2017

funny-falcon commented Nov 14, 2017

akzhan commented Nov 14, 2017

pnloyd commented Nov 15, 2017 • edited Loading

RX14 commented Nov 17, 2017

funny-falcon commented Nov 17, 2017

RX14 commented Nov 18, 2017

akzhan commented Nov 24, 2017

funny-falcon commented Nov 24, 2017

funny-falcon commented Nov 26, 2017

funny-falcon commented Nov 26, 2017

akzhan commented Nov 26, 2017 • edited Loading

yxhuvud commented Nov 26, 2017 • edited Loading

RX14 commented Nov 26, 2017

funny-falcon commented Nov 26, 2017

funny-falcon commented Nov 26, 2017

RX14 commented Nov 26, 2017 • edited Loading

funny-falcon commented Nov 26, 2017 • edited Loading

funny-falcon commented Nov 26, 2017

RX14 commented Nov 26, 2017

vlazar commented Jun 16, 2018

funny-falcon commented Jun 16, 2018

RX14 commented Jun 21, 2018

vlazar commented Oct 12, 2019

ysbaddaden commented Oct 12, 2019

akzhan commented Nov 8, 2017 •

edited

Loading

RX14 commented Nov 9, 2017 •

edited

Loading

RX14 commented Nov 9, 2017 •

edited

Loading

RX14 Nov 14, 2017 •

edited

Loading

yxhuvud Nov 13, 2017 •

edited

Loading

pnloyd commented Nov 15, 2017 •

edited

Loading

akzhan commented Nov 26, 2017 •

edited

Loading

yxhuvud commented Nov 26, 2017 •

edited

Loading

RX14 commented Nov 26, 2017 •

edited

Loading

funny-falcon commented Nov 26, 2017 •

edited

Loading