-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Short circuit Hash#find_entry for empty Hashes #5085
Short circuit Hash#find_entry for empty Hashes #5085
Conversation
like what percentage of empty hashes do you expect to be present? if it's just 1/1000 hashes that is empty there is not much to gain and if there are lots of empty hashes then there probably is something wrong in the code? (although the penalty of this commit actually seems to be really low if any) what if you shortcut it manually using an if before the query and asking hash.empty? |
Updated the OP with kemal's performance numbers:
|
@@ -833,6 +833,7 @@ class Hash(K, V) | |||
end | |||
|
|||
protected def find_entry(key) | |||
return nil if empty? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please can we have a blank line seperating this check and the definitions below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
By exiting find_entry early when the size of the Hash is 0, we avoid doing the actual bucket calculations. For empty Hashes this means a 4.4X improvement in lookups, with no apparent change for non-empty Hashes. Sample benchmark script: ```crystal require "benchmark" empty_hash = {} of String => String small_hash = {"a" => "a"} large_hash = {} of String => String 10_000.times { |i| large_hash["a#{i}"] = "a#{i}" } Benchmark.ips do |x| x.report("empty Hash") { empty_hash.has_key?("foo") } x.report("small Hash, key exists") { small_hash.has_key?("a") } x.report("small Hash, key doesn't exist") { small_hash.has_key?("foo") } x.report("large Hash, key exists") { large_hash.has_key?("a1234") } x.report("large Hash, key doesn't exist") { large_hash.has_key?("foo") } end ``` Without this commit: ``` empty Hash 70.32M ( 14.22ns) (±8.77%) small Hash, key exists 80.63M ( 12.4ns) (±6.42%) small Hash, key doesn't exist 75.42M ( 13.26ns) (±7.84%) large Hash, key exists 53.28M ( 18.77ns) (±7.98%) large Hash, key doesn't exist 58.86M ( 16.99ns) (±8.41%) ``` With this commit: ``` empty Hash 309.52M ( 3.23ns) (±6.53%) small Hash, key exists 81.66M ( 12.25ns) (±6.41%) small Hash, key doesn't exist 75.84M ( 13.19ns) (±6.12%) large Hash, key exists 53.34M ( 18.75ns) (±6.16%) large Hash, key doesn't exist 62.54M ( 15.99ns) (±7.20%) ```
7120ff2
to
cecb364
Compare
This was an interesting catch 🥇 Thanks @Fryguy 🙏 |
These benchmarks are quite suspect, as they basically only verify that the CPU branch prediction work. |
By exiting find_entry early when the size of the Hash is 0, we avoid
doing the actual bucket calculations. For empty Hashes this means a
4.4X improvement in lookups, with no apparent change for non-empty
Hashes.
Sample benchmark script:
Without this commit:
With this commit:
@sdogruyol noticed this in Kemal: kemalcr/kemal@63e613a In this case, they showed a 3-5% performance improvement: https://gitter.im/sdogruyol/kemal?at=59d66e18f7299e8f53b14208