Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Hash implementation #5256

Closed
wants to merge 8 commits into from

Conversation

funny-falcon
Copy link
Contributor

use array of entries + open addressing index of entries.

Closes #4557

use array of entries + open addressing index of entries.
Copy link
Contributor

@RX14 RX14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some name changes, and a explanitory comment on how it works would be invaluable to reviewing this.

src/hash.cr Outdated
@buckets_size : Int32
@first : Entry(K, V)?
@last : Entry(K, V)?
@sz : UInt8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is badly named.

src/hash.cr Outdated
end

# Similar to `#first_key?`, but returns its value.
def first_value?
@first.try &.value
unless empty?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return nil if empty?/return yield if empty? is more readable and ditto down the page.

src/hash.cr Outdated
protected def find_entry(key)
return nil if empty?
@[AlwaysInline]
private def iter_entries
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each_entry?

src/hash.cr Outdated
@sz = newsz
end

private def need_shrink(size : Int32, sz : UInt8) : Bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs_shrink??

src/hash.cr Outdated

private def nindex(sz)
mask = SIZES[sz].indexmask
mask + (mask != 0 ? 1 : 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mask == 0 ? 0 : 1

src/hash.cr Outdated
(h >> 1) + 1
end

private def nindex(sz)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a very crystally name.

@funny-falcon
Copy link
Contributor Author

Worse problem: doesn't look it is faster :-(
While some microbenchmarks I've made shows it is a bit faster, http helloworld is slower, and std_spec is also a bit slower.

And 32bit compiler test didn't pass in Travis-CI.

@funny-falcon
Copy link
Contributor Author

Looks like, reallocation costs too much.
Probably could be better if entries are allocated by chunks instead of by huge array.
I'll close this PR now until improving.

@akzhan
Copy link
Contributor

akzhan commented Nov 8, 2017

I think that Hash may have two different implementations for small (<7) and other sizes.

I'm sure that small hashes have a big value.

@asterite
Copy link
Member

asterite commented Nov 9, 2017

I benchmarked this PR and it works twice as fast as the current Hash. Here's my benchmark:

a = {} of Int32 => String

strings = Array.new(10_000_000, &.to_s)

time = Time.now
strings.each_with_index do |s, i|
  a[i] = s
end
puts Time.now - time

Remember to run this with bin/crystal in master and bin/crystal with your branch, so we are sure both use funny hash.

It would be interesting to see your benchmarks.

I think this Hash should be more efficient because in the current Hash every time you add a key-value pair there's an allocation for the node. In your implementation that doesn't seem to be the case.

@funny-falcon
Copy link
Contributor Author

It is expected for single huge hash to be faster.

But http helloworld ( from crystal-lang.org main page) prouces many small hashes.

@RX14
Copy link
Contributor

RX14 commented Nov 9, 2017

Testing compiler compile performance (with --no-codegen) before and after is likely to be a good indicator of whether this hash is better just in theory or in practice. Microbenchmarks are only so useful.

I don't like the idea of having a rigid cutoff point with 2 different hash implementations. There must be a way to make it fast at any size.

Another optimization I would love to make is having only 1 malloc call for am empty hash, as at least in the rust compiler I know they found that hashtables being created, never used, but discarded was fairly common.

@funny-falcon
Copy link
Contributor Author

I have an idea to use chunked allocaion, ie allocate by chunks of 8 entries.
Probably it will be faster than current hash implementation, and not too slower than this PR on huge hashes. I will investigate it.

@RX14
Copy link
Contributor

RX14 commented Nov 9, 2017

@funny-falcon Assuming that a larger chunk size would increase performance, you could change chunk size based on the size of the hash to keep the memory overhead bounded when small and performance overhead miniscule when large.

@akzhan
Copy link
Contributor

akzhan commented Nov 10, 2017

What about implementation of pessimistic downsize politics?

@funny-falcon
Copy link
Contributor Author

@akzhan , what do you mean?

@akzhan
Copy link
Contributor

akzhan commented Nov 10, 2017 via email

@funny-falcon
Copy link
Contributor Author

That was already in this patch: it resized down only if size < capacity / 4

@funny-falcon
Copy link
Contributor Author

Looks like chunked version is not slower than current implementation even on http hello world, so that I reopen pull request.

@funny-falcon funny-falcon reopened this Nov 12, 2017
@RX14
Copy link
Contributor

RX14 commented Nov 12, 2017

@funny-falcon Please give us some documentation on how this works to work with!

@rebuild_num = 0_u16
@first = 0_u32
@last = 0_u32
@index = Pointer(UInt32).new(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest using Pointer.empty.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean Pointer.null.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RX14 right, my bad.

src/hash.cr Outdated
@block = block
unless initial_capacity.nil?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if initial_capacity

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it may be zero, isn't it?

Copy link
Contributor

@RX14 RX14 Nov 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only nil, Pointer.null and false are falsey in crystal. 0 is truthy, which means the only practical type where you need to use Object#nil? is Bool?, which is fairly rare and a situation often better described with an enum.

src/hash.cr Outdated
chunks = chunks_ptr
@first.upto(@last - 1) do |i|
chunk = chunks[i / CHUNK]
entry = chunk + i % CHUNK
Copy link
Contributor

@yxhuvud yxhuvud Nov 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not certain there is a noticeable difference or if llvm is smart enough to notice this by itself, but perhaps

chunk_index, offset = i.divmod(CHUNK)

to save a division.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llvm should optimize division and module by 8 to shifts and mask (ie i >> 3 and i & 7)

@akzhan
Copy link
Contributor

akzhan commented Nov 14, 2017

Is it GTG?

@funny-falcon
Copy link
Contributor Author

It plays bad with conservative GC on 32 bit platform:

  • stored hashsum could be counted as pointers (failed weakptr test on travis ci). That is why in first version I stored hashsums in separate array (GC then knows to not look inside).
  • index array is five-ten times greater than bins in current implementation. Looks like GC doesn't like such big arrays.

So, while small scripts will still work, larger applications become less reliable on 32bit platform with this patch.

So, if support of 32 bit platform is important, this patch should be improved. I don't know, how it should be done. I'm still thinking.

@funny-falcon
Copy link
Contributor Author

Doubtfuly Crystal will gain precise GC soon.

@akzhan
Copy link
Contributor

akzhan commented Nov 14, 2017

Have you seen #5271?

@pnloyd
Copy link

pnloyd commented Nov 15, 2017

@akzhan, not totally clear if anyone can or wants to push that forward ATM though.

@RX14
Copy link
Contributor

RX14 commented Nov 17, 2017

I wouldn't focus on the performance of 32-bit too much, but it should work and be stable. 32bit is almost obsolete on x86, but on arm it'll still be a while until 32bit SoCs stop being sold.

@funny-falcon
Copy link
Contributor Author

32bit version works. It just doesn't play well with conservative GC.

I may make 32bit version more "workable" by not storing hash-sum, and changing it to closed addressing/chained version. I think, a lot of code will be common between 32bit and 64bit versions, so they will co-exists in one file with several if flag?(:32bit).

@RX14
Copy link
Contributor

RX14 commented Nov 18, 2017

Perhaps seperating the "implementation-specific" and "helper-methods" parts of hash.cr could keep it more organised. Keep the implementation-specific methods which are changed by this PR at the bottom of the file, and everything else above the file. We can then have a big comment which explains the implementation halfway down the file before all the implementation.

@akzhan
Copy link
Contributor

akzhan commented Nov 24, 2017

This Pull request is alone that not yet finished anf merged with inspiration of #4675, what is it's state now?

@funny-falcon
Copy link
Contributor Author

I was a bit busy with work and family. I plan to spend some time on weekends.

To work around 32bit platform's issues, use chaining instead of
open addressing.
@funny-falcon
Copy link
Contributor Author

damn :-( now 64bit version fails with "Repeated allocation of very large block"... I can't understand, why ;-(

@funny-falcon
Copy link
Contributor Author

And I can't reproduce on my notebook :-( My version of libgc doesn't complain against. As well as circleci.

@akzhan
Copy link
Contributor

akzhan commented Nov 26, 2017

Can you ask @ivmai about strange "Repeated allocation of very large block"?

@yxhuvud
Copy link
Contributor

yxhuvud commented Nov 26, 2017

I thought the repeated allocation was just a warning. I get that in my implementation too.

EDIT: Or well, it is not really 'just' a warning as stuff like this shouldn't warn, but not an error.

@RX14
Copy link
Contributor

RX14 commented Nov 26, 2017

No, the error is that the compiler specs ran out of memory. Check the usage before and after locally.

@funny-falcon
Copy link
Contributor Author

Problem is I cann't reproduce. And circleci cann't as well. So, looks like it depends on hosts libgc.

I have one suggestion: probably it is because old code doesn't reallock huge bins array on #clean . But to check I need refactor code.

@funny-falcon
Copy link
Contributor Author

I turned off deallocation on clear, but looks like it doesn't help.
I don't know what to do further :-(

@RX14
Copy link
Contributor

RX14 commented Nov 26, 2017

@funny-falcon it's unlikely to be a libgc bug. You probably can't reproduce it simply because you have more ram than the travis VM. Look at your ram usage when running make compiler_spec before and after this PR.

@funny-falcon
Copy link
Contributor Author

funny-falcon commented Nov 26, 2017

@RX14, I don't claim it is libgc bug. I've said, libgc behaves differently between travis-ci and my computer. Unfortunately, on my pc I have no this warnings. So it complicates investigation a lot.

@funny-falcon
Copy link
Contributor Author

On ~5minute point (~610 samples) master consumed 490MB and patched 390MB, therefore it is not direct memory consumption, unfortunately.

@RX14
Copy link
Contributor

RX14 commented Nov 26, 2017

@funny-falcon that's very strange.

@vlazar
Copy link
Contributor

vlazar commented Jun 16, 2018

@funny-falcon Do you think it would make a sense to try with latest Crystal release? It's been a while, maybe the issue disappeared? :)

@funny-falcon
Copy link
Contributor Author

Yes, I will try

@RX14
Copy link
Contributor

RX14 commented Jun 21, 2018

Yeah i'm quite sad this PR stalled, Hash is one of the most-used datastructures in the compiler and many other crystal programs. And it's probably the datastructure with most scope for performance changes in it's implementation. I'd love to see crystal have a well-optimized Hash, and see what performance effects it has, especially if it has any effect on compile times.

@vlazar
Copy link
Contributor

vlazar commented Oct 12, 2019

With #8017 merged in should this one be closed?

@ysbaddaden
Copy link
Contributor

Yes, closing.

@ysbaddaden ysbaddaden closed this Oct 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reimplementation a Hash using open addressing
9 participants