Short circuit Hash#find_entry for empty Hashes #5085

Fryguy · 2017-10-06T18:30:29Z

By exiting find_entry early when the size of the Hash is 0, we avoid
doing the actual bucket calculations. For empty Hashes this means a
4.4X improvement in lookups, with no apparent change for non-empty
Hashes.

Sample benchmark script:

require "benchmark"

empty_hash = {} of String => String
small_hash = {"a" => "a"}
large_hash = {} of String => String
10_000.times { |i| large_hash["a#{i}"] = "a#{i}" }

Benchmark.ips do |x|
  x.report("empty Hash") { empty_hash.has_key?("foo") }
  x.report("small Hash, key exists") { small_hash.has_key?("a") }
  x.report("small Hash, key doesn't exist") { small_hash.has_key?("foo") }
  x.report("large Hash, key exists") { large_hash.has_key?("a1234") }
  x.report("large Hash, key doesn't exist") { large_hash.has_key?("foo") }
end

Without this commit:

                   empty Hash   70.32M ( 14.22ns) (±8.77%)
       small Hash, key exists   80.63M (  12.4ns) (±6.42%)
small Hash, key doesn't exist   75.42M ( 13.26ns) (±7.84%)
       large Hash, key exists   53.28M ( 18.77ns) (±7.98%)
large Hash, key doesn't exist   58.86M ( 16.99ns) (±8.41%)

With this commit:

                   empty Hash  309.52M (  3.23ns) (±6.53%)
       small Hash, key exists   81.66M ( 12.25ns) (±6.41%)
small Hash, key doesn't exist   75.84M ( 13.19ns) (±6.12%)
       large Hash, key exists   53.34M ( 18.75ns) (±6.16%)
large Hash, key doesn't exist   62.54M ( 15.99ns) (±7.20%)

@sdogruyol noticed this in Kemal: kemalcr/kemal@63e613a In this case, they showed a 3-5% performance improvement: https://gitter.im/sdogruyol/kemal?at=59d66e18f7299e8f53b14208

ghost · 2017-10-06T20:21:03Z

like what percentage of empty hashes do you expect to be present?
(seems not all that frequent to me tbh)

if it's just 1/1000 hashes that is empty there is not much to gain and if there are lots of empty hashes then there probably is something wrong in the code?

(although the penalty of this commit actually seems to be really low if any)

what if you shortcut it manually using an if before the query and asking hash.empty?

Fryguy · 2017-10-06T20:32:20Z

Updated the OP with kemal's performance numbers:

In this case, they showed a 3-5% performance improvement: https://gitter.im/sdogruyol/kemal?at=59d66e18f7299e8f53b14208

RX14 · 2017-10-06T21:34:52Z

src/hash.cr

@@ -833,6 +833,7 @@ class Hash(K, V)
  end

  protected def find_entry(key)
+    return nil if empty?


Please can we have a blank line seperating this check and the definitions below.

By exiting find_entry early when the size of the Hash is 0, we avoid doing the actual bucket calculations. For empty Hashes this means a 4.4X improvement in lookups, with no apparent change for non-empty Hashes. Sample benchmark script: ```crystal require "benchmark" empty_hash = {} of String => String small_hash = {"a" => "a"} large_hash = {} of String => String 10_000.times { |i| large_hash["a#{i}"] = "a#{i}" } Benchmark.ips do |x| x.report("empty Hash") { empty_hash.has_key?("foo") } x.report("small Hash, key exists") { small_hash.has_key?("a") } x.report("small Hash, key doesn't exist") { small_hash.has_key?("foo") } x.report("large Hash, key exists") { large_hash.has_key?("a1234") } x.report("large Hash, key doesn't exist") { large_hash.has_key?("foo") } end ``` Without this commit: ``` empty Hash 70.32M ( 14.22ns) (±8.77%) small Hash, key exists 80.63M ( 12.4ns) (±6.42%) small Hash, key doesn't exist 75.42M ( 13.26ns) (±7.84%) large Hash, key exists 53.28M ( 18.77ns) (±7.98%) large Hash, key doesn't exist 58.86M ( 16.99ns) (±8.41%) ``` With this commit: ``` empty Hash 309.52M ( 3.23ns) (±6.53%) small Hash, key exists 81.66M ( 12.25ns) (±6.41%) small Hash, key doesn't exist 75.84M ( 13.19ns) (±6.12%) large Hash, key exists 53.34M ( 18.75ns) (±6.16%) large Hash, key doesn't exist 62.54M ( 15.99ns) (±7.20%) ```

sdogruyol · 2017-10-07T12:12:16Z

This was an interesting catch 🥇 Thanks @Fryguy 🙏

yxhuvud · 2017-10-14T08:23:29Z

These benchmarks are quite suspect, as they basically only verify that the CPU branch prediction work.

RX14 reviewed Oct 6, 2017

View reviewed changes

Fryguy force-pushed the faster_empty_hash_lookup branch from 7120ff2 to cecb364 Compare October 6, 2017 22:38

RX14 approved these changes Oct 6, 2017

View reviewed changes

asterite approved these changes Oct 13, 2017

View reviewed changes

ysbaddaden merged commit 53b208c into crystal-lang:master Oct 14, 2017

Fryguy deleted the faster_empty_hash_lookup branch October 17, 2017 15:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Short circuit Hash#find_entry for empty Hashes #5085

Short circuit Hash#find_entry for empty Hashes #5085

Fryguy commented Oct 6, 2017 •

edited

Loading

ghost commented Oct 6, 2017 •

edited by ghost

Loading

Fryguy commented Oct 6, 2017

RX14 Oct 6, 2017

Fryguy Oct 6, 2017

sdogruyol commented Oct 7, 2017

yxhuvud commented Oct 14, 2017

Short circuit Hash#find_entry for empty Hashes #5085

Short circuit Hash#find_entry for empty Hashes #5085

Conversation

Fryguy commented Oct 6, 2017 • edited Loading

ghost commented Oct 6, 2017 • edited by ghost Loading

Fryguy commented Oct 6, 2017

RX14 Oct 6, 2017

Choose a reason for hiding this comment

Fryguy Oct 6, 2017

Choose a reason for hiding this comment

sdogruyol commented Oct 7, 2017

yxhuvud commented Oct 14, 2017

Fryguy commented Oct 6, 2017 •

edited

Loading

ghost commented Oct 6, 2017 •

edited by ghost

Loading