Open addressing algorithm for RubyHash #5215

ChrisBr · 2018-06-11T19:53:51Z

This is a very early stage of implementing open addressing for JRuby hashes suggested in #4708. The MRI issue is located here: https://bugs.ruby-lang.org/issues/12142.

After discussing with @headius, he suggested to open a PR to continue discussion here. Please note that this is far from a mergable state.

I tried a very simple benchmark and recognize a small performance improvement:

require 'benchmark'
n = 50_000
small = 1000
tmp = {}
Benchmark.bmbm do |x|
  x.report("Add:") { n.times { small.times { |small| tmp[small] = true } } }
  x.report("Fetch:") { n.times { small.times { |small| tmp[small] } } }
end

Gives me with this PR:

Rehearsal ------------------------------------------
Add:     5.880000   0.220000   6.100000 (  5.127261)
Fetch:   5.770000   0.130000   5.900000 (  5.375460)
-------------------------------- total: 12.000000sec

             user     system      total        real
Add:     4.900000   0.000000   4.900000 (  4.884514)
Fetch:   5.270000   0.010000   5.280000 (  5.234178)

Current master:

Rehearsal ------------------------------------------
Add:     6.140000   0.160000   6.300000 (  5.220546)
Fetch:   6.020000   0.070000   6.090000 (  5.695494)
-------------------------------- total: 12.390000sec

             user     system      total        real
Add:     5.150000   0.050000   5.200000 (  5.086812)
Fetch:   6.260000   0.000000   6.260000 (  6.224505)

(Intel Core i7-7820HQ, 32 GB RAM)

However, I would need some guidance how to gather meaningful results.

Currently is also missing the approach to only use one bucket for small hashes. I have this already working in another branch, however, did not see a (speed) improvement. Could be worth to investigate from a memory improvement though.

Thanks for any feedback!

kares · 2018-06-12T06:32:24Z

core/src/main/java/org/jruby/RubyHash.java

+
+    private final int[] internalCopyBins() {
+        int[] newBins = new int[bins.length];
+        for(int i = 0; i < bins.length; i++) {


these can now do a simple System.arraycopy(bins, 0, newBins, 0, bins.length)

kares · 2018-06-12T06:33:27Z

core/src/main/java/org/jruby/RubyHash.java

            }
-            if (entry != null && entry.isLive()) {
+            if (entry == null) break;


so there's no longer null buckets for old (removed) entries, is that correct?

Currently we mark them with isLive() = false and iterate to the first null value (current end of the entries array).
But the better approach is probably set to null when deleting and iterate till size. I will implement this in the next round, this was just quick & dirty to get ruby booting again ...

In the bins array we have three different values now:

-1 empty, stop searching

-2 deleted, jump over it

>= index of the hash / key

headius · 2018-06-12T06:44:53Z

I modified your benchmark to use benchmark-ips and simple loops, since plain benchmark tends not to exercise methods well and block-based loops have much more overhead than simple loops.

require 'benchmark/ips'
N = 50_000
small = 1000
tmp = {}
Benchmark.ips do |x|
  x.report("Add:") { i = 0; while i < N; i+=1; tmp[small] = true; end }
  x.report("Fetch:") { i = 0; while i < N; i+=1; tmp[small]; end }
end

Here's numbers with standard options, invokedynamic, and invokedynamic on GraalVM.

Before your work:

[] ~/projects/jruby $ jruby hash_bench.rb 
Warming up --------------------------------------
                Add:    48.000  i/100ms
              Fetch:    62.000  i/100ms
Calculating -------------------------------------
                Add:    589.298  (± 5.1%) i/s -      2.976k in   5.065141s
              Fetch:    677.965  (± 5.8%) i/s -      3.410k in   5.049201s

[] ~/projects/jruby $ jruby -Xcompile.invokedynamic hash_bench.rb 
Warming up --------------------------------------
                Add:    70.000  i/100ms
              Fetch:    95.000  i/100ms
Calculating -------------------------------------
                Add:    912.031  (± 8.2%) i/s -      4.550k in   5.029019s
              Fetch:      1.025k (± 5.1%) i/s -      5.130k in   5.020724s

[] ~/projects/jruby $ (pickjdk 1 ; jruby -Xcompile.invokedynamic -Xfixnum.cache=false hash_bench.rb)
New JDK: graalvm
Warming up --------------------------------------
                Add:    97.000  i/100ms
              Fetch:   153.000  i/100ms
Calculating -------------------------------------
                Add:      1.487k (± 2.7%) i/s -      7.469k in   5.027800s
              Fetch:      1.744k (± 5.3%) i/s -      8.721k in   5.022021s

And after your work:

[] ~/projects/jruby $ jruby hash_bench.rb 
Warming up --------------------------------------
                Add:    59.000  i/100ms
              Fetch:    65.000  i/100ms
Calculating -------------------------------------
                Add:    723.540  (± 6.8%) i/s -      3.599k in   5.002143s
              Fetch:    726.660  (± 5.4%) i/s -      3.640k in   5.026187s

[] ~/projects/jruby $ jruby -Xcompile.invokedynamic hash_bench.rb 
Warming up --------------------------------------
                Add:    73.000  i/100ms
              Fetch:    98.000  i/100ms
Calculating -------------------------------------
                Add:    935.883  (±12.6%) i/s -      4.599k in   5.041101s
              Fetch:      1.190k (± 8.2%) i/s -      5.978k in   5.071986s

[] ~/projects/jruby $ (pickjdk 1 ; jruby -Xcompile.invokedynamic -Xfixnum.cache=false hash_bench.rb)
New JDK: graalvm
Warming up --------------------------------------
                Add:   116.000  i/100ms
              Fetch:   173.000  i/100ms
Calculating -------------------------------------
                Add:      1.634k (± 3.8%) i/s -      8.236k in   5.048905s
              Fetch:      1.959k (± 4.3%) i/s -      9.861k in   5.048234s

Looks like a good win so far!

headius · 2018-06-12T07:10:20Z

I did a size comparison. For a Hash with 100_000 entries mapping 0-99_000 to 1, the old impl is 11.847MB in memory and the new version is 11.382MB. If we subtract some of the common objects:

RubyHashEntry: 37 bytes * 100_000 = 3.7MB
RubyFixnum: 56 bytes * 100_000 = 5.6MB
Total shared between old and new (roughly): 9.3

That gives us old and new sizes of 2.547 and 2.082, a savings of 0.465MB or over 18% on the base RubyHash structure. That's pretty good!

So based on that I thought I'd try some small hashes like {a:1}, which we obviously see a lot of in Ruby too.

Subtracting the size of the symbol and fixnum, this hash is 166 bytes on the new impl and 52 bytes on the old impl, so this may be justification for packing simple hashes down a bit tighter.

ChrisBr · 2018-06-12T08:39:21Z

@headius thanks for the benchmark 👍

Subtracting the size of the symbol and fixnum, this hash is 166 bytes on the new impl and 52 bytes on the old impl, so this may be justification for packing simple hashes down a bit tighter.

Ok! I already had some code for a LinearSearch approach without the bins array. I will try to merge these two implementations tonight.

I will also look into getting rid of the RubyHashEntry elements completely, discussed yesterday with @lopex and extracting it into a Util class.

Stay tuned 🎸

headius · 2018-06-12T19:14:12Z

If we can get rid of the RubyHashEntry objects that would be a tremendous memory savings.

ChrisBr · 2018-06-12T20:53:38Z

core/src/main/java/org/jruby/RubyHash.java

-        entries = new RubyHashEntry[MRI_INITIAL_CAPACITY];
-        bins = new int[MRI_INITIAL_CAPACITY * 2];
+        entries = new IRubyObject[MRI_INITIAL_CAPACITY * 2];
+        bins = new int[MRI_INITIAL_CAPACITY * 4];


This might not make sense. The entries array is now twice as big but is also twice as fast full (because we store key and value in separate rows now). So I think we could still go with MRI_INITIAL_CAPACITY * 2 for the bins array.

ChrisBr · 2018-06-12T20:59:39Z

If we can get rid of the RubyHashEntry objects that would be a tremendous memory savings.

I pushed an initial version of getting rid of the RubyHashEntries. For now, I don't store them anymore but use them as temporary objects. We now store the key and values in separate rows in the entries array. Key's are stored in the even rows and values in the odd rows. This adds quite some complexity from my POV 😢

I haven't benchmarked it yet as I think there is a bug in the script you posted (you never increase small variable, do you?). Would be great if you could have a look at it again and run your benchmarks @headius.

I also tried to move this to a Util class but I have some open questions:

I would need to access the flags in the Utils class (e.g. COMPARE_BY_IDENTITY_F). How would I do that?
We also would need to implement the iterator, checkIterating, checkModify etc. I guess this would be quite some work ...

However, I we want to implement the LinearSearchHash for small hashes, I think it would be very beneficial if we could extract these two classes...

ChrisBr · 2018-06-13T20:47:12Z

@headius I added now a linear search approach for small hashes (< 16 elements).

ChrisBr · 2018-06-19T19:50:22Z

@headius any update regarding your benchmarking script or how I should proceed?

enebo · 2018-06-19T20:44:30Z

@ChrisBr he is on vacation this week. You could fix the bench and generate some numbers and see if removing the extra types helped (which of course means running fixed benchmark with original RubyHashEntry - linear search. Memory we can probably calculate manually. We would largely remove 3.7M - field size.

ChrisBr · 2018-06-20T21:17:24Z

@enebo okay! Here you go:

This is the script I used:

require 'benchmark/ips'

[3, 15, 16, 100, 1_000, 10_000].each do |n|
  puts "=========== #{n} elements ============="
  tmp = {}
  Benchmark.ips do |x|
    x.report("Add:") { i = 0; while i < n; tmp[i] = true; i += 1; end }
    x.report("Fetch:") { i = 0; while i < n; tmp[i]; i += 1; end }
  end
end

Benchmark of this PR

linear search for hash size < 16
open addressing for hash size > 16
"no" HashEntry elements

=========== 3 elements =============
Warming up --------------------------------------
                Add:    97.614k i/100ms
              Fetch:   103.702k i/100ms
Calculating -------------------------------------
                Add:      3.957M (± 6.4%) i/s -     19.718M in   5.005939s
              Fetch:      3.560M (± 7.3%) i/s -     17.733M in   5.009620s

=========== 15 elements =============
Warming up --------------------------------------
                Add:    39.879k i/100ms
              Fetch:    24.936k i/100ms
Calculating -------------------------------------
                Add:    306.576k (± 4.2%) i/s -      1.555M in   5.082280s
              Fetch:    316.506k (± 4.4%) i/s -      1.596M in   5.052030s

=========== 16 elements =============
Warming up --------------------------------------
                Add:    63.316k i/100ms
              Fetch:    67.519k i/100ms
Calculating -------------------------------------
                Add:      1.243M (± 2.8%) i/s -      6.268M in   5.048272s
              Fetch:      1.438M (± 3.1%) i/s -      7.224M in   5.029939s

=========== 100 elements =============
Warming up --------------------------------------
                Add:    19.133k i/100ms
              Fetch:    20.735k i/100ms
Calculating -------------------------------------
                Add:    233.558k (± 1.4%) i/s -      1.186M in   5.079982s
              Fetch:    259.941k (± 3.8%) i/s -      1.306M in   5.033026s

=========== 1_000 elements =============
Warming up --------------------------------------
                Add:     1.846k i/100ms
              Fetch:     1.880k i/100ms
Calculating -------------------------------------
                Add:     19.931k (± 9.9%) i/s -     99.684k in   5.058294s
              Fetch:     21.963k (± 4.1%) i/s -    110.920k in   5.059002s

=========== 10_000 elements =============
Warming up --------------------------------------
                Add:   182.000  i/100ms
              Fetch:   201.000  i/100ms
Calculating -------------------------------------
                Add:      1.960k (± 3.9%) i/s -      9.828k in   5.021143s
              Fetch:      2.060k (± 3.0%) i/s -     10.452k in   5.077663s

Benchmark open addressing with HashEntry elements, no linear search

=========== 3 elements =============
Warming up --------------------------------------
                Add:    88.846k i/100ms
              Fetch:    89.483k i/100ms
Calculating -------------------------------------
                Add:      3.946M (± 5.4%) i/s -     19.724M in   5.015148s
              Fetch:      4.206M (± 5.1%) i/s -     21.028M in   5.015265s
=========== 15 elements =============
Warming up --------------------------------------
                Add:    61.520k i/100ms
              Fetch:    59.891k i/100ms
Calculating -------------------------------------
                Add:      1.186M (± 3.2%) i/s -      5.967M in   5.034577s
              Fetch:      1.113M (± 8.4%) i/s -      5.570M in   5.042245s

=========== 16 elements =============
Warming up --------------------------------------
                Add:    54.942k i/100ms
              Fetch:    60.876k i/100ms
Calculating -------------------------------------
                Add:      1.060M (± 6.6%) i/s -      5.329M in   5.050794s
              Fetch:      1.119M (± 4.2%) i/s -      5.600M in   5.012535s

=========== 100 elements =============
Warming up --------------------------------------
                Add:    16.894k i/100ms
              Fetch:    17.104k i/100ms
Calculating -------------------------------------
                Add:    197.118k (± 5.6%) i/s -    996.746k in   5.073804s
              Fetch:    184.110k (± 6.7%) i/s -    923.616k in   5.038682s

=========== 1_000 elements =============
Warming up --------------------------------------
                Add:     2.071k i/100ms
              Fetch:     2.644k i/100ms
Calculating -------------------------------------
                Add:     19.272k (± 3.6%) i/s -     97.337k in   5.057518s
              Fetch:     28.354k (± 9.2%) i/s -    142.776k in   5.081599s

=========== 10_000 elements =============
Warming up --------------------------------------
                Add:   186.000  i/100ms
              Fetch:   254.000  i/100ms
Calculating -------------------------------------
                Add:      1.926k (± 4.2%) i/s -      9.672k in   5.031638s
              Fetch:      3.073k (± 5.0%) i/s -     15.494k in   5.054577s

Benchmark master

=========== 3 elements =============
Warming up --------------------------------------
                Add:    84.669k i/100ms
              Fetch:    90.737k i/100ms
Calculating -------------------------------------
                Add:      3.554M (± 7.2%) i/s -     17.696M in   5.007616s
              Fetch:      3.981M (± 5.9%) i/s -     19.871M in   5.009714s

=========== 15 elements =============
Warming up --------------------------------------
                Add:    59.260k i/100ms
              Fetch:    60.791k i/100ms
Calculating -------------------------------------
                Add:      1.034M (± 6.7%) i/s -      5.156M in   5.011235s
              Fetch:      1.221M (± 4.2%) i/s -      6.140M in   5.038107s

=========== 16 elements =============
Warming up --------------------------------------
                Add:    57.091k i/100ms
              Fetch:    58.878k i/100ms
Calculating -------------------------------------
                Add:      1.063M (± 4.2%) i/s -      5.309M in   5.001955s
              Fetch:      1.148M (± 3.9%) i/s -      5.770M in   5.036016s

=========== 100 elements =============
Warming up --------------------------------------
                Add:    13.132k i/100ms
              Fetch:    15.879k i/100ms
Calculating -------------------------------------
                Add:    171.201k (± 5.3%) i/s -    866.712k in   5.077764s
              Fetch:    190.280k (± 3.4%) i/s -    952.740k in   5.013155s

=========== 1_000 elements=============
Warming up --------------------------------------
                Add:     2.091k i/100ms
              Fetch:     2.384k i/100ms
Calculating -------------------------------------
                Add:     18.318k (± 4.8%) i/s -     92.004k in   5.034774s
              Fetch:     24.077k (± 8.3%) i/s -    121.584k in   5.085813s

=========== 10_000 elements =============
Warming up --------------------------------------
                Add:   159.000  i/100ms
              Fetch:   207.000  i/100ms
Calculating -------------------------------------
                Add:      1.651k (± 4.4%) i/s -      8.268k in   5.018704s
              Fetch:      2.065k (± 6.1%) i/s -     10.350k in   5.030614s

Summary

Looks like linear search approach for 15 elements (the max size for using linear search in MRI) is quite slower than hashing (open/closed addressing). We could try to use a smaller array (e.g. only using for 8 elements) with the cost of adding one more resize.

Also 1_000 / 10_000 elements seems to be slower without HashEntry objects. Maybe because the array is general bigger?
For 1_000 elements closed addressing (master) seems even to be a little faster, no idea why this could be?

I don't have a version for linear search with HashEntry objects, @enebo do you think it makes sense to implement such a version for comparison?

Thanks

lopex · 2018-06-20T23:47:07Z

@ChrisBr @enebo just looking at https://github.com/ruby/ruby/blob/trunk/st.c#L73 is enough to get rid of that hash cache value for me. If we get rid of it we can store the pairs as values in the bin array

ChrisBr · 2018-06-21T15:22:41Z

@ChrisBr @enebo just looking at https://github.com/ruby/ruby/blob/trunk/st.c#L73 is enough to get rid of that hash cache value for me. If we get rid of it we can store the pairs as values in the bin array

We already got rid of the hash cache value as we only store the key and values in the entries array. The bin array contains the index of the location in the entries array. So you would suggest to store key & value in the bins and entries array? Or getting rid of one array? Getting rid is probably not possible as the one makes it possible to access the key/values in the insertion order (entries) and the other one by the hash key index.

cc @lopex

ChrisBr · 2018-06-21T18:42:47Z

So I changed the boundary for linear search to 8 elements as 16 seemed clearly too high. These are the results for 7 elements in the hash.

Open Addressing without RubyHashEntry

Warming up --------------------------------------
                Add:    83.871k i/100ms
              Fetch:    86.148k i/100ms
Calculating -------------------------------------
                Add:      2.060M (± 3.8%) i/s -     10.316M in   5.016326s
              Fetch:      2.129M (± 4.3%) i/s -     10.682M in   5.027352s

Linear search without RubyHashEntry (this PR)

Warming up --------------------------------------
                Add:    77.096k i/100ms
              Fetch:    78.024k i/100ms
Calculating -------------------------------------
                Add:      2.752M (± 4.5%) i/s -     13.723M in   4.999213s
              Fetch:      2.533M (± 4.6%) i/s -     12.640M in   5.001592s

master

Warming up --------------------------------------
                Add:    85.253k i/100ms
              Fetch:    85.682k i/100ms
Calculating -------------------------------------
                Add:      2.757M (± 6.0%) i/s -     13.726M in   4.997771s
              Fetch:      3.062M (± 4.3%) i/s -     15.337M in   5.018572s

Adding seems to be around the same for master and linear search but fetching is still faster with current master.

ChrisBr · 2018-06-21T19:02:30Z

I also tried the fast skip bucket approach with calculating the hash on the fly (because we don't store it in the RubyHashEntry anymore), but is seems to be a lot slower:

For linear search with 7 elements (we need to calculate the hash two times because we don't need to calculate the hash to find the bucket as we just iterate)

Warming up --------------------------------------
                Add:    75.843k i/100ms
              Fetch:    61.195k i/100ms
Calculating -------------------------------------
                Add:      2.705M (± 3.8%) i/s -     13.500M in   4.999016s
              Fetch:    745.768k (± 3.0%) i/s -      3.733M in   5.010025s

For open addressing with 16 & 100 elements (we need to calculate the hash only once as we already have it calculated to find the bucket)

Warming up --------------------------------------
                Add:    63.963k i/100ms
              Fetch:    54.923k i/100ms
Calculating -------------------------------------
                Add:      1.391M (± 3.1%) i/s -      6.972M in   5.018705s
              Fetch:      1.085M (± 3.0%) i/s -      5.437M in   5.017587s

Warming up --------------------------------------
                Add:    16.932k i/100ms
              Fetch:    14.819k i/100ms
Calculating -------------------------------------
                Add:    245.067k (± 2.5%) i/s -      1.236M in   5.046987s
              Fetch:    180.837k (± 2.4%) i/s -    903.959k in   5.001831s

See #5215 (comment) for reference

headius · 2018-06-26T20:46:36Z

Ok I'm back :-)

Thoughts on what we else we might try to speed up fetches? I think overall I want to go forward with this PR, since the memory sizing and locality will probably make up for a minor degradation in straight-line perf.

ChrisBr · 2018-06-29T07:39:09Z

@headius great you're back, hope you had amazing vacations 🌴 🌞

I just saw your conversation on IRC.

For linear search, we currently store everything in the entries array (without a bin array) like this:

entries
key
value
another key
another value
...

So we do not cache the hash anymore which means we always do full object comparison (eql) for every entry when we iterate over it (fetch, add, delete ...). If we cache the hash, we could do a quick skip and only check the hash. The idea I discussed with @lopex was now:

entries
key
value
hash
another key
another value
another hash
...

However, as we need to store objects, we need to store the hash as boxed int. The other idea you mentioned could also be to introduce another hash array, something like this:

entries
key
value
another key
another value
...

hashes
hash
another hash
...

This would introduce some indirection.

So far I'm about to introduce the first approach (not done yet) to see the impact.

We also still have the RubyHashEntry as intermediate objects which we should git rid of.

In general, do we want to merge everything altogether or first only the linear search or what is our preferred way to go forward?

ChrisBr · 2018-07-03T19:33:11Z

So I tried the hash caching from #5215 (comment)

Result storing hash in the entries array (boxing of the hash int value):

Warming up --------------------------------------
                Add:    75.137k i/100ms
              Fetch:    74.403k i/100ms
Calculating -------------------------------------

                Add:      1.836M (± 4.9%) i/s -      9.167M in   5.007020s

              Fetch:      1.921M (± 5.8%) i/s -      9.598M in   5.015082s

For comparison, this is the result without any hash caching:

Warming up --------------------------------------
                Add:    77.096k i/100ms
              Fetch:    78.024k i/100ms
Calculating -------------------------------------
                Add:      2.752M (± 4.5%) i/s -     13.723M in   4.999213s
              Fetch:      2.533M (± 4.6%) i/s -     12.640M in   5.001592s

edit: updated to make the result more clear

eregon · 2018-07-05T15:24:25Z

Just for information, TruffleRuby uses the same layout for small Hashes, a Object[] with (hash, key, value) per entry: https://github.com/oracle/truffleruby/blob/41797d45bd6dc95af17c8829a7dd0e9c4ef3a207/src/main/java/org/truffleruby/core/hash/PackedArrayStrategy.java#L30-L32
http://chrisseaton.com/truffleruby/small-data-structures/ explains it and related optimizations in more details.

ChrisBr · 2018-07-05T18:33:27Z

@eregon thanks, very interesting read!

headius · 2018-07-12T22:38:07Z

@ChrisBr Today @enebo pointed out to me that these benchmarks are probably a best case scenario for hash, which might make the real cost of eliminating the cache a bit harder to see. If these were user-defined objects, for example, they'd have a nontrivial hash calculation cost plus dynamic dispatch plus a Ruby method, which would be much more overhead than say, Float#hash.

This is just more justification for doing the tagalong long[] for caching hash values, I guess.

ChrisBr · 2018-07-13T15:57:02Z

@headius @enebo I updated the PR and here are the latest numbers:

Here are the latest numbers:

7 Entries

=========== PR =============
Warming up --------------------------------------
                Add:    53.919k i/100ms
              Fetch:    58.226k i/100ms
Calculating -------------------------------------
                Add:      1.146M (± 4.5%) i/s -      5.715M in   4.997110s
              Fetch:      1.190M (± 3.9%) i/s -      5.939M in   4.998983s

=========== master =============
Warming up --------------------------------------
                Add:    54.473k i/100ms
              Fetch:    55.501k i/100ms
Calculating -------------------------------------
                Add:      1.238M (± 4.7%) i/s -      6.210M in   5.027007s
              Fetch:      1.150M (± 3.9%) i/s -      5.772M in   5.027577s

50

=========== PR =============
Warming up --------------------------------------
                Add:    12.941k i/100ms
              Fetch:    14.554k i/100ms
Calculating -------------------------------------
                Add:    168.951k (± 4.7%) i/s -    854.106k in   5.067267s
              Fetch:    177.780k (± 4.0%) i/s -    887.794k in   5.002208s

=========== master =============
Warming up --------------------------------------
                Add:    10.658k i/100ms
              Fetch:    11.819k i/100ms
Calculating -------------------------------------
                Add:    146.557k (± 4.2%) i/s -    735.402k in   5.027122s
              Fetch:    139.718k (± 3.3%) i/s -    709.140k in   5.081481s

100

=========== PR =============
Warming up --------------------------------------
                Add:     6.364k i/100ms
              Fetch:     7.673k i/100ms
Calculating -------------------------------------
                Add:     86.035k (± 4.1%) i/s -    432.752k in   5.038878s
              Fetch:     89.486k (± 3.5%) i/s -    452.707k in   5.065653s

=========== master =============
Warming up --------------------------------------
                Add:     5.573k i/100ms
              Fetch:     5.952k i/100ms
Calculating -------------------------------------
                Add:     70.386k (± 4.8%) i/s -    356.672k in   5.079609s
              Fetch:     63.540k (± 5.0%) i/s -    321.408k in   5.071384s

1_000

=========== PR =============
Warming up --------------------------------------
                Add:   627.000  i/100ms
              Fetch:   641.000  i/100ms
Calculating -------------------------------------
                Add:      7.508k (± 4.5%) i/s -     37.620k in   5.021288s
              Fetch:      7.404k (± 3.7%) i/s -     37.178k in   5.028579s

=========== master =============
Warming up --------------------------------------
                Add:   494.000  i/100ms
              Fetch:   535.000  i/100ms
Calculating -------------------------------------
                Add:      5.878k (± 7.0%) i/s -     29.640k in   5.069555s
              Fetch:      5.921k (± 5.8%) i/s -     29.960k in   5.078294s

10_000

=========== PR =============
Warming up --------------------------------------
                Add:    58.000  i/100ms
              Fetch:    61.000  i/100ms
Calculating -------------------------------------
                Add:    735.166  (± 4.9%) i/s -      3.712k in   5.061558s
              Fetch:    747.408  (± 5.1%) i/s -      3.782k in   5.073927s

=========== master =============
Warming up --------------------------------------
                Add:    45.000  i/100ms
              Fetch:    50.000  i/100ms
Calculating -------------------------------------
                Add:    525.851  (± 4.0%) i/s -      2.655k in   5.057292s
              Fetch:    524.928  (± 4.6%) i/s -      2.650k in   5.059683s

Summary

almost all RubyHashEntries (except where we need it to not break API)
small hashes (less than 8 entries) maintain now a hashes array for caching the hash values
code is almost in a mergable state, there are just a few Todo's marked in the code where I need some feedback

ChrisBr · 2018-07-13T20:47:14Z

This is the script I used for benchmarking. Compared to the previous script, I switched to strings as the hashes for Fixnum is basically the value which made a really bad distribution in the hash.

require 'benchmark/ips'
n = 7
tmp = {}
Benchmark.ips do |x|
  x.report("Add:") { i = 0; while i < n; tmp[i.to_s] = true; i += 1; end }
  x.report("Fetch:") { i = 0; while i < n; tmp[i.to_s]; i += 1; end }
end

ChrisBr · 2018-07-13T21:07:45Z

Here are the results with Fixnum as keys:

7 entries

=========== PR =============
Warming up --------------------------------------
                Add:    78.345k i/100ms
              Fetch:    88.949k i/100ms
Calculating -------------------------------------
                Add:      1.992M (± 3.9%) i/s -      9.950M in   5.003316s
              Fetch:      2.813M (± 4.6%) i/s -     14.054M in   5.008151s

=========== master =============
Warming up --------------------------------------
                Add:    78.743k i/100ms
              Fetch:    91.848k i/100ms
Calculating -------------------------------------
                Add:      2.086M (± 4.2%) i/s -     10.473M in   5.030410s
              Fetch:      2.358M (± 3.9%) i/s -     11.848M in   5.032691s

100 entries

=========== PR =============
Warming up --------------------------------------
                Add:    17.493k i/100ms
              Fetch:    20.460k i/100ms
Calculating -------------------------------------
                Add:    198.844k (± 2.7%) i/s -    997.101k in   5.018367s
              Fetch:    279.817k (± 2.8%) i/s -      1.412M in   5.049486s

=========== master =============
Warming up --------------------------------------
                Add:    19.260k i/100ms
              Fetch:    20.710k i/100ms
Calculating -------------------------------------
                Add:    255.548k (± 6.9%) i/s -      1.271M in   4.998424s
              Fetch:    243.930k (± 6.0%) i/s -      1.222M in   5.026877s

1_000 entries

=========== PR =============
Warming up --------------------------------------
                Add:     1.908k i/100ms
              Fetch:     2.194k i/100ms
Calculating -------------------------------------
                Add:     23.979k (± 4.3%) i/s -    120.204k in   5.023167s
              Fetch:     25.095k (± 3.8%) i/s -    127.252k in   5.078665s
=========== master =============
Warming up --------------------------------------
                Add:     1.859k i/100ms
              Fetch:     2.124k i/100ms
Calculating -------------------------------------
                Add:     22.449k (± 4.2%) i/s -    113.399k in   5.061254s
              Fetch:     18.735k (± 3.7%) i/s -     95.580k in   5.109285s

10_000 entries

=========== PR =============
Warming up --------------------------------------
                Add:   174.000  i/100ms
              Fetch:   211.000  i/100ms
Calculating -------------------------------------
                Add:      1.966k (± 3.8%) i/s -      9.918k in   5.051684s
              Fetch:      2.300k (± 4.6%) i/s -     11.605k in   5.056634s

=========== master =============
Warming up --------------------------------------
                Add:   182.000  i/100ms
              Fetch:   206.000  i/100ms
Calculating -------------------------------------
                Add:      2.114k (± 5.9%) i/s -     10.738k in   5.098915s
              Fetch:      2.007k (± 6.9%) i/s -     10.094k in   5.053988s

ChrisBr · 2018-07-13T21:24:53Z

As requested, here are some numbers with a Struct as key:

require 'benchmark/ips'

Struct.new("Key", :number)
[10000].each do |n|
  puts "=========== #{n} ============="
  tmp = {}
  Benchmark.ips do |x|
    x.report("Add:") { i = 0; while i < n; tmp[Struct::Key.new(i)] = true; i += 1; end }
    x.report("Fetch:") { i = 0; while i < n; tmp[Struct::Key.new(i)]; i += 1; end }
  end
end

7 entries

=========== PR =============
Warming up --------------------------------------
                Add:    24.410k i/100ms
              Fetch:    25.626k i/100ms
Calculating -------------------------------------
                Add:    363.832k (± 4.6%) i/s -      1.831M in   5.043230s
              Fetch:    368.314k (± 4.6%) i/s -      1.845M in   5.021579s
=========== master =============
Warming up --------------------------------------
                Add:    25.832k i/100ms
              Fetch:    27.607k i/100ms
Calculating -------------------------------------
                Add:    396.205k (± 4.8%) i/s -      1.989M in   5.033827s
              Fetch:    378.239k (± 4.4%) i/s -      1.905M in   5.046487s

100 entries

=========== PR =============
Warming up --------------------------------------
                Add:     2.708k i/100ms
              Fetch:     3.210k i/100ms
Calculating -------------------------------------
                Add:     31.264k (± 4.3%) i/s -    157.064k in   5.033652s
              Fetch:     30.637k (± 3.4%) i/s -    154.080k in   5.035370s

=========== master =============
Warming up --------------------------------------
                Add:     2.234k i/100ms
              Fetch:     2.439k i/100ms
Calculating -------------------------------------
                Add:     26.292k (± 3.9%) i/s -    131.806k in   5.021136s
              Fetch:     24.783k (± 4.0%) i/s -    124.389k in   5.027655s

1_000 entries

=========== PR =============
Warming up --------------------------------------
                Add:   244.000  i/100ms
              Fetch:   265.000  i/100ms
Calculating -------------------------------------
                Add:      2.747k (± 4.3%) i/s -     13.908k in   5.072170s
              Fetch:      2.648k (± 4.3%) i/s -     13.250k in   5.013028s

=========== master =============
Warming up --------------------------------------
                Add:   222.000  i/100ms
              Fetch:   220.000  i/100ms
Calculating -------------------------------------
                Add:      2.388k (± 4.5%) i/s -     11.988k in   5.030474s
              Fetch:      2.325k (± 6.4%) i/s -     11.660k in   5.036870s

10_000 entries

=========== PR =============
Warming up --------------------------------------
                Add:    23.000  i/100ms
              Fetch:    24.000  i/100ms
Calculating -------------------------------------
                Add:    256.426  (± 5.1%) i/s -      1.288k in   5.037974s
              Fetch:    248.294  (± 6.0%) i/s -      1.248k in   5.046306s

=========== master =============
Warming up --------------------------------------
                Add:    20.000  i/100ms
              Fetch:    19.000  i/100ms
Calculating -------------------------------------
                Add:    207.432  (± 4.3%) i/s -      1.040k in   5.024066s
              Fetch:    200.244  (± 4.5%) i/s -      1.007k in   5.039651s

headius · 2018-07-14T00:22:22Z

Ok I have some great news!

I ran a little keyword argument benchmark against 9.2, 9.2.1, and 9.2.1 with this PR. The results are extremely promising:

9.2 on Hotspot C2: 1.40s
9.2 on Graal JIT: 0.61s
9.2.1 on Hotspot C2: 1.36s
9.2.1 on Graal JIT: 0.30s
9.2.1+PR on Hotspot: 1.38s
9.2.1+PR on Graal JIT: 0.10s

So while these changes have a small perf drop on Hotspot C2 versus non-PR 9.2.1 (potentially within margin of error) they are in fact MUCH faster on Graal: around 3x, which is in turn 7x faster than MRI and 14x faster than 9.2 on Hotspot C2.

If this is passing tests, I propose we land it now so it can bake for a while on master. I've pushed the PR contents to my improve_kwargs branch, which includes some tweaks for kwarg processing.

@ChrisBr This is very exciting!

which basically just created a linked list. This approach is not possible anymore with open addressing. As open addressing hashes are anyway compacter, we don't need it anymore.

headius · 2018-08-06T18:25:41Z

Ok, @enebo and I see no reason not to go ahead and merge!

kares · 2021-11-29T06:47:58Z

just adding a note to myself here: the work here never shipped and this PR got reverted before 9.2.1.0 was released

p.s. would be nice to maybe reconsider this for 9.4

headius · 2021-11-29T13:34:23Z

@kares I would very much like to reintroduce open addressing.

The main problem here is that it made an already concurrency-sensitive implementation (chained buckets) even more sensitive, since the number of state changes for any update increased and there's more opportunity for threads to step on each other. I believe that in order to safely support open addressing in Hash, we need to finally make Hash concurrency-safe. That means either locking (heavy-handed solution) or clever use of atomic updates (tricky to do without allocating a state object for every change, which would impact performance).

kares reviewed Jun 12, 2018

View reviewed changes

ChrisBr commented Jun 12, 2018

View reviewed changes

ChrisBr force-pushed the performance/hash branch from 90cb4b7 to f296f24 Compare June 13, 2018 20:34

ChrisBr force-pushed the performance/hash branch from a0afbdc to 09965f3 Compare July 13, 2018 15:57

ChrisBr force-pushed the performance/hash branch from 09965f3 to 5ae7260 Compare July 14, 2018 06:09

ChrisBr added 16 commits August 4, 2018 14:29

Fix gemspec error

ac56023

Check for null key in BaseIterator

a979c45

Search maximum number of bins

93d034b

Fix deleting first element

0a851a1

Fix null values in BaseIterator

31f94de

Move start/end pointers correctly

e05189a

Fix any

4cc1196

Check resize for small hash

4f04047

Fix json parser test

108ddb2

Fix ENV when key got deleted

d158c06

More fixes for ENV

1344ba0

Refactor set value

3d44b3a

Fix BaseIterator remove

de95d26

Return correct value

468b346

Fix start and end pointers

ab40019

Fix visitLimited for concurrency (marshalTo and marshalFrom)

fc561b1

ChrisBr force-pushed the performance/hash branch from f12d7fc to fc561b1 Compare August 4, 2018 05:30

ChrisBr added 3 commits August 4, 2018 15:43

Fix rehashing

65297b9

Fix base iterator delete

57972fb

Remove small hash

6ac904d

which basically just created a linked list. This approach is not possible anymore with open addressing. As open addressing hashes are anyway compacter, we don't need it anymore.

ChrisBr force-pushed the performance/hash branch from 360e1e9 to 6ac904d Compare August 5, 2018 17:10

headius added this to the JRuby 9.2.1.0 milestone Aug 6, 2018

headius merged commit e483fb2 into jruby:master Aug 6, 2018

yahonda mentioned this pull request Aug 14, 2018

bundle install fails since #5215 #5280

Closed

ahorek mentioned this pull request Aug 31, 2018

rehashing regression with compare_by_identity #5304

Closed

kares removed this from the JRuby 9.2.1.0 milestone Nov 29, 2021

enebo added this to the Won't Fix milestone Dec 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Open addressing algorithm for RubyHash #5215

Open addressing algorithm for RubyHash #5215

ChrisBr commented Jun 11, 2018

kares Jun 12, 2018

kares Jun 12, 2018

ChrisBr Jun 12, 2018

ChrisBr Jun 12, 2018 •

edited

headius commented Jun 12, 2018

headius commented Jun 12, 2018

ChrisBr commented Jun 12, 2018

headius commented Jun 12, 2018

ChrisBr Jun 12, 2018

ChrisBr commented Jun 12, 2018

ChrisBr commented Jun 13, 2018

ChrisBr commented Jun 19, 2018

enebo commented Jun 19, 2018

ChrisBr commented Jun 20, 2018

lopex commented Jun 20, 2018

ChrisBr commented Jun 21, 2018 •

edited

ChrisBr commented Jun 21, 2018

ChrisBr commented Jun 21, 2018 •

edited

headius commented Jun 26, 2018

ChrisBr commented Jun 29, 2018

ChrisBr commented Jul 3, 2018 •

edited

eregon commented Jul 5, 2018

ChrisBr commented Jul 5, 2018

headius commented Jul 12, 2018

ChrisBr commented Jul 13, 2018

ChrisBr commented Jul 13, 2018 •

edited

ChrisBr commented Jul 13, 2018

ChrisBr commented Jul 13, 2018

headius commented Jul 14, 2018 •

edited

headius commented Aug 6, 2018

kares commented Nov 29, 2021 •

edited

headius commented Nov 29, 2021

Open addressing algorithm for RubyHash #5215

Open addressing algorithm for RubyHash #5215

Conversation

ChrisBr commented Jun 11, 2018

kares Jun 12, 2018

Choose a reason for hiding this comment

kares Jun 12, 2018

Choose a reason for hiding this comment

ChrisBr Jun 12, 2018

Choose a reason for hiding this comment

ChrisBr Jun 12, 2018 • edited

Choose a reason for hiding this comment

headius commented Jun 12, 2018

headius commented Jun 12, 2018

ChrisBr commented Jun 12, 2018

headius commented Jun 12, 2018

ChrisBr Jun 12, 2018

Choose a reason for hiding this comment

ChrisBr commented Jun 12, 2018

ChrisBr commented Jun 13, 2018

ChrisBr commented Jun 19, 2018

enebo commented Jun 19, 2018

ChrisBr commented Jun 20, 2018

Benchmark of this PR

Benchmark open addressing with HashEntry elements, no linear search

Benchmark master

Summary

lopex commented Jun 20, 2018

ChrisBr commented Jun 21, 2018 • edited

ChrisBr commented Jun 21, 2018

Open Addressing without RubyHashEntry

Linear search without RubyHashEntry (this PR)

master

ChrisBr commented Jun 21, 2018 • edited

headius commented Jun 26, 2018

ChrisBr commented Jun 29, 2018

ChrisBr commented Jul 3, 2018 • edited

eregon commented Jul 5, 2018

ChrisBr commented Jul 5, 2018

headius commented Jul 12, 2018

ChrisBr commented Jul 13, 2018

7 Entries

50

100

1_000

10_000

Summary

ChrisBr commented Jul 13, 2018 • edited

ChrisBr commented Jul 13, 2018

7 entries

100 entries

1_000 entries

10_000 entries

ChrisBr commented Jul 13, 2018

7 entries

100 entries

1_000 entries

10_000 entries

headius commented Jul 14, 2018 • edited

headius commented Aug 6, 2018

kares commented Nov 29, 2021 • edited

headius commented Nov 29, 2021

ChrisBr Jun 12, 2018 •

edited

ChrisBr commented Jun 21, 2018 •

edited

ChrisBr commented Jun 21, 2018 •

edited

ChrisBr commented Jul 3, 2018 •

edited

ChrisBr commented Jul 13, 2018 •

edited

headius commented Jul 14, 2018 •

edited

kares commented Nov 29, 2021 •

edited