Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Short circuit Hash#find_entry for empty Hashes #5085

Merged
merged 1 commit into from Oct 14, 2017

Conversation

Fryguy
Copy link
Contributor

@Fryguy Fryguy commented Oct 6, 2017

By exiting find_entry early when the size of the Hash is 0, we avoid
doing the actual bucket calculations. For empty Hashes this means a
4.4X improvement in lookups, with no apparent change for non-empty
Hashes.

Sample benchmark script:

require "benchmark"

empty_hash = {} of String => String
small_hash = {"a" => "a"}
large_hash = {} of String => String
10_000.times { |i| large_hash["a#{i}"] = "a#{i}" }

Benchmark.ips do |x|
  x.report("empty Hash") { empty_hash.has_key?("foo") }
  x.report("small Hash, key exists") { small_hash.has_key?("a") }
  x.report("small Hash, key doesn't exist") { small_hash.has_key?("foo") }
  x.report("large Hash, key exists") { large_hash.has_key?("a1234") }
  x.report("large Hash, key doesn't exist") { large_hash.has_key?("foo") }
end

Without this commit:

                   empty Hash   70.32M ( 14.22ns) (±8.77%)
       small Hash, key exists   80.63M (  12.4ns) (±6.42%)
small Hash, key doesn't exist   75.42M ( 13.26ns) (±7.84%)
       large Hash, key exists   53.28M ( 18.77ns) (±7.98%)
large Hash, key doesn't exist   58.86M ( 16.99ns) (±8.41%)

With this commit:

                   empty Hash  309.52M (  3.23ns) (±6.53%)
       small Hash, key exists   81.66M ( 12.25ns) (±6.41%)
small Hash, key doesn't exist   75.84M ( 13.19ns) (±6.12%)
       large Hash, key exists   53.34M ( 18.75ns) (±6.16%)
large Hash, key doesn't exist   62.54M ( 15.99ns) (±7.20%)

@sdogruyol noticed this in Kemal: kemalcr/kemal@63e613a In this case, they showed a 3-5% performance improvement: https://gitter.im/sdogruyol/kemal?at=59d66e18f7299e8f53b14208

@ghost
Copy link

ghost commented Oct 6, 2017

like what percentage of empty hashes do you expect to be present?
(seems not all that frequent to me tbh)

if it's just 1/1000 hashes that is empty there is not much to gain and if there are lots of empty hashes then there probably is something wrong in the code?

(although the penalty of this commit actually seems to be really low if any)

what if you shortcut it manually using an if before the query and asking hash.empty?

@Fryguy
Copy link
Contributor Author

Fryguy commented Oct 6, 2017

Updated the OP with kemal's performance numbers:

In this case, they showed a 3-5% performance improvement: https://gitter.im/sdogruyol/kemal?at=59d66e18f7299e8f53b14208

@@ -833,6 +833,7 @@ class Hash(K, V)
end

protected def find_entry(key)
return nil if empty?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please can we have a blank line seperating this check and the definitions below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

By exiting find_entry early when the size of the Hash is 0, we avoid
doing the actual bucket calculations.  For empty Hashes this means a
4.4X improvement in lookups, with no apparent change for non-empty
Hashes.

Sample benchmark script:

```crystal
require "benchmark"

empty_hash = {} of String => String
small_hash = {"a" => "a"}
large_hash = {} of String => String
10_000.times { |i| large_hash["a#{i}"] = "a#{i}" }

Benchmark.ips do |x|
  x.report("empty Hash") { empty_hash.has_key?("foo") }
  x.report("small Hash, key exists") { small_hash.has_key?("a") }
  x.report("small Hash, key doesn't exist") { small_hash.has_key?("foo") }
  x.report("large Hash, key exists") { large_hash.has_key?("a1234") }
  x.report("large Hash, key doesn't exist") { large_hash.has_key?("foo") }
end
```

Without this commit:

```
                   empty Hash   70.32M ( 14.22ns) (±8.77%)
       small Hash, key exists   80.63M (  12.4ns) (±6.42%)
small Hash, key doesn't exist   75.42M ( 13.26ns) (±7.84%)
       large Hash, key exists   53.28M ( 18.77ns) (±7.98%)
large Hash, key doesn't exist   58.86M ( 16.99ns) (±8.41%)
```

With this commit:

```
                   empty Hash  309.52M (  3.23ns) (±6.53%)
       small Hash, key exists   81.66M ( 12.25ns) (±6.41%)
small Hash, key doesn't exist   75.84M ( 13.19ns) (±6.12%)
       large Hash, key exists   53.34M ( 18.75ns) (±6.16%)
large Hash, key doesn't exist   62.54M ( 15.99ns) (±7.20%)
```
@sdogruyol
Copy link
Member

This was an interesting catch 🥇 Thanks @Fryguy 🙏

@ysbaddaden ysbaddaden merged commit 53b208c into crystal-lang:master Oct 14, 2017
@yxhuvud
Copy link
Contributor

yxhuvud commented Oct 14, 2017

These benchmarks are quite suspect, as they basically only verify that the CPU branch prediction work.

@Fryguy Fryguy deleted the faster_empty_hash_lookup branch October 17, 2017 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants