Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve Ruby's Set performance #4334

Closed
kares opened this issue Nov 24, 2016 · 19 comments
Closed

improve Ruby's Set performance #4334

kares opened this issue Nov 24, 2016 · 19 comments
Assignees
Milestone

Comments

@kares
Copy link
Member

kares commented Nov 24, 2016

... seems that Java's HashSet is constantly twice as fast on include? these days :

require 'set'
require 'java'
require 'benchmark'

TIMES = (ARGV[0] || 5).to_i

CONTENTS = (1..36).to_a

LOOP = 10_000_000

TIMES.times do
  Benchmark.bm(10) do |bm|
    bm.report('Ruby Set#include? hit ') do
      set = Set.new CONTENTS
      hit = 32
      LOOP.times { set.include?(hit) }
    end
    bm.report('Ruby Set#include? miss') do
      set = Set.new CONTENTS
      miss = 0
      LOOP.times { set.include?(miss) }
    end

    bm.report('HashSet#include? hit  ') do
      set = java.util.HashSet.new CONTENTS
      hit = 32
      LOOP.times { set.include?(hit) }
    end
    bm.report('HashSet#include? miss ') do
      set = java.util.HashSet.new CONTENTS
      miss = 0
      LOOP.times { set.include?(miss) }
    end
  end
end
                 user     system      total        real
Ruby Set#include? hit   1.780000   0.000000   1.780000 (  1.092435)
Ruby Set#include? miss  1.280000   0.000000   1.280000 (  1.014602)
HashSet#include? hit    0.710000   0.000000   0.710000 (  0.540562)
HashSet#include? miss   0.660000   0.000000   0.660000 (  0.524503)
                 user     system      total        real
Ruby Set#include? hit   0.850000   0.000000   0.850000 (  0.846941)
Ruby Set#include? miss  0.910000   0.000000   0.910000 (  0.901849)
HashSet#include? hit    0.480000   0.000000   0.480000 (  0.478561)
HashSet#include? miss   0.470000   0.000000   0.470000 (  0.465833)
                 user     system      total        real
Ruby Set#include? hit   0.840000   0.000000   0.840000 (  0.841024)
Ruby Set#include? miss  0.910000   0.000000   0.910000 (  0.904074)
HashSet#include? hit    0.480000   0.000000   0.480000 (  0.474359)
HashSet#include? miss   0.460000   0.000000   0.460000 (  0.465034)
                 user     system      total        real
Ruby Set#include? hit   0.850000   0.000000   0.850000 (  0.846504)
Ruby Set#include? miss  0.910000   0.000000   0.910000 (  0.905145)
HashSet#include? hit    0.470000   0.000000   0.470000 (  0.475149)
HashSet#include? miss   0.480000   0.000000   0.480000 (  0.469817)
                 user     system      total        real
Ruby Set#include? hit   0.830000   0.000000   0.830000 (  0.832843)
Ruby Set#include? miss  0.990000   0.000000   0.990000 (  0.927099)
HashSet#include? hit    0.480000   0.000000   0.480000 (  0.479107)
HashSet#include? miss   0.490000   0.000000   0.490000 (  0.468669)

NOTE: Java's Set#include? used to be slow but has been optimized along the 9K line

@enebo
Copy link
Member

enebo commented Nov 28, 2016

@kares what do you plan on changing? Ruby's Set is implemented in Ruby itself and is backed by a Ruby Hash. Or are you thinking of a full Java native version?

@kares
Copy link
Member Author

kares commented Nov 28, 2016

@enebo hey! no plan so far. thought I report - to have some feedback whether its preferable the way it is.
is there any reason why it wasn't implemented in native (it's used extensibly by Rails) besides having less to worry about ?

@enebo
Copy link
Member

enebo commented Nov 28, 2016

@kares maintenance would be a primary reason but if set is significant for something as important as Rails perhaps we should be looking at things that way?

@kares
Copy link
Member Author

kares commented Nov 29, 2016

looked into whether it gets even slower with Rails (due AS patching) ... seems that Set is somehow the only Ruby std library piece they leave as is :)

                 user     system      total        real
Ruby Set#include? hit   0.660000   0.000000   0.660000 (  0.664855)
Ruby Set#include? miss  0.760000   0.000000   0.760000 (  0.753341)
HashSet#include? hit    0.470000   0.000000   0.470000 (  0.456537)
HashSet#include? miss   0.460000   0.000000   0.460000 (  0.447680)

.. with active_support/core_ext loaded - guess it's not worth the effort unless real-world use-cases shows up.

@headius
Copy link
Member

headius commented Nov 29, 2016

set.rb definitely could use a rework. I've thought about writing it in Java a few times to get MAXIMUM SPEED but maintaining the "hash wrapper" behavior is tricky.

I'm actually rather impressed we only lose this much for the set.rb impl, since every operation is wrapped by Ruby code.

@headius
Copy link
Member

headius commented Nov 29, 2016

Oh other rework item needed: make Set thread-safe.

@kares
Copy link
Member Author

kares commented Nov 30, 2016

@headius yy, also kind of expected it to be slower (maybe other ops are - only micro-benched include?)
I am not sure why it should be made thread-safe - none of the core structures are (maybe concurrent-ruby should include such collection: Concurrent::Set)

@headius
Copy link
Member

headius commented Dec 2, 2016

One item slowing this (and other) Set methods down is the use of Hash#[], which to IR looks like it might need a frame (for $~ in String#[]). If I modify Set#include? to call Hash#include? instead of Hash#[], things improve a bit:

before/after:

[] ~/projects/jruby $ jruby set_bench.rb 
                 user     system      total        real
Ruby Set#include? hit   2.010000   0.020000   2.030000 (  1.120571)
Ruby Set#include? miss  1.350000   0.000000   1.350000 (  1.120890)
                 user     system      total        real
Ruby Set#include? hit   0.880000   0.010000   0.890000 (  0.879133)
Ruby Set#include? miss  1.050000   0.000000   1.050000 (  1.036543)
                 user     system      total        real
Ruby Set#include? hit   0.870000   0.000000   0.870000 (  0.864939)
Ruby Set#include? miss  1.040000   0.000000   1.040000 (  1.034190)
                 user     system      total        real
Ruby Set#include? hit   0.900000   0.000000   0.900000 (  0.897126)
Ruby Set#include? miss  1.040000   0.010000   1.050000 (  1.037943)
                 user     system      total        real
Ruby Set#include? hit   0.840000   0.000000   0.840000 (  0.841677)
Ruby Set#include? miss  1.050000   0.000000   1.050000 (  1.041484)

[] ~/projects/jruby $ jruby set_bench.rb 
                 user     system      total        real
Ruby Set#include? hit   1.750000   0.020000   1.770000 (  0.967469)
Ruby Set#include? miss  1.320000   0.010000   1.330000 (  0.943466)
                 user     system      total        real
Ruby Set#include? hit   0.800000   0.000000   0.800000 (  0.804858)
Ruby Set#include? miss  0.880000   0.010000   0.890000 (  0.882824)
                 user     system      total        real
Ruby Set#include? hit   0.800000   0.000000   0.800000 (  0.800321)
Ruby Set#include? miss  0.860000   0.000000   0.860000 (  0.852134)
                 user     system      total        real
Ruby Set#include? hit   0.800000   0.000000   0.800000 (  0.800266)
Ruby Set#include? miss  0.850000   0.000000   0.850000 (  0.856337)
                 user     system      total        real
Ruby Set#include? hit   0.900000   0.000000   0.900000 (  0.799154)
Ruby Set#include? miss  0.860000   0.000000   0.860000 (  0.847760)

I'll open a bug with MRI to change that include? impl.

@enebo
Copy link
Member

enebo commented Dec 2, 2016

@headius they have an optaref instr in yarv which will not dyndispatch in hash case for []. Have you timed this bench using this change with MRI?

@headius
Copy link
Member

headius commented Dec 2, 2016

@headius
Copy link
Member

headius commented Dec 2, 2016

@enebo Bah, you're right. https://bugs.ruby-lang.org/issues/10754

They are just a bit slower on the bench if it uses Hash#include?.

They probably will reject my change, but I guess this means we could implement it using Hash#include? if we want to.

@kares
Copy link
Member Author

kares commented Dec 2, 2016

heh, najs trick - esp. the Hash.new(false) part to get [] to return false for missing keys :)
... seems like set.rb is hacked with MRI internals in mind -> no point for JRuby not to do its own bits.

@kares
Copy link
Member Author

kares commented Dec 7, 2016

decided to look into that -> realized that I also miss JI for Ruby's Set and the set.rb code is a bit ugly on places :

  • SortedSet doing eval on first initialize,
  • singleton class eval in Set's divide algorithm.
  • Thread.current in inspect due recursion detection (JRuby has native API for doing that)

all seem unnecessary + there's no consistency on API (Set#add seems to be an API to use for all additions but its not -> there's certainly room for improvements when going native)

hopefully I will manage to get this acceptable for Ruby 2.4 (and do not change my mind 🎱)

@kares kares self-assigned this Dec 7, 2016
@headius
Copy link
Member

headius commented Dec 7, 2016

@kares Ahh, so you're looking into some set.rb improvements! That's great! Try to make sure you isolate jruby-specific stuff so merging from MRI upstream doesn't get too complicated. I'm not opposed to having a native ext if there's something really hot to go there, but the overall performance of set.rb shouldn't usually be limited by the Ruby code.

One place it may be necessary would be synchronization; if we decide we want Set to be thread-safe, adding block-based synchronization as in Monitor would be a significant performance hit. Having some native synchronization would help there.

Of course, I still hold out hope that some day we might have thread-safe Array and Hash anyway.

@headius
Copy link
Member

headius commented Dec 7, 2016

Oh one other thought about the [] calls that were slowing down include?: we have talked about adding pragmas in the past, and this would be a good case for it. Specifically, we could have a pragma that indicates to the IR compiler that this is NOT going to be a String#[Regexp] call that would require backref logic. It would be a way of saying "I know this method or call will never need backref".

Of course we still plan to find a way to lazily allocate space for backref only when it's needed, but for now static compiler tricks are not bad.

@enebo
Copy link
Member

enebo commented Dec 7, 2016

I wonder if we could do this another way?

@lookup = Hash.method(:[])

#...

def include?(element)
  @lookup.call(@hash, element)
end

I don't neccesarily mean using unbound method it could be our own API which returns a callable thing but we will know what it is because it is just a method. IR could then know for our thing what it is and make it into a static dispatch.

@kares
Copy link
Member Author

kares commented Dec 8, 2016

I am still interested in going native (full set.rb rewrite in Java). set.rb is not changing that much (and isn't actually used beyond some level -> surprised to no find any SortedSet usage in Rails). anyway, SortedSet is quite ugly (behaves differently whether a particular gem is installed) and could be implemented easily with Java's TreeSet.

what I mostly desire - beyond performance - is Set implements java.util.Set (like RubyArray, RubyHash do -> no collection conversion between Java <-> Ruby)

@kares
Copy link
Member Author

kares commented Dec 8, 2016

looks promising, 9.1.6.0 (set.rb) :

                 user     system      total        real
Ruby Set#include? hit   0.870000   0.000000   0.870000 (  0.866756)
Ruby Set#include? miss  0.930000   0.000000   0.930000 (  0.912139)
HashSet#include? hit    0.500000   0.000000   0.500000 (  0.493965)
HashSet#include? miss   0.510000   0.000000   0.510000 (  0.506165)
     Set.new([1,2,3])    8.630000   0.000000   8.630000 (  8.590427)
SortedSet.new([1,2,3])  14.520000   0.030000  14.550000 ( 14.508678)
     Set#inspect         9.150000   0.010000   9.160000 (  9.137015)
SortedSet#inspect        5.900000   0.010000   5.910000 (  5.836307)
     Set#flatten         8.610000   0.010000   8.620000 (  8.602421)
     Set#each {|e| e}    1.540000   0.000000   1.540000 (  1.533745)

... native Set/SortedSet (without call site-caching) :

                 user     system      total        real
Ruby Set#include? hit   0.460000   0.000000   0.460000 (  0.461569)
Ruby Set#include? miss  0.520000   0.010000   0.530000 (  0.521436)
HashSet#include? hit    0.490000   0.000000   0.490000 (  0.486099)
HashSet#include? miss   0.460000   0.000000   0.460000 (  0.455156)
     Set.new([1,2,3])    2.080000   0.000000   2.080000 (  2.076242)
SortedSet.new([1,2,3])   4.310000   0.000000   4.310000 (  4.299778)
     Set#inspect         4.830000   0.000000   4.830000 (  4.821318)
SortedSet#inspect        4.770000   0.000000   4.770000 (  4.757981)
     Set#flatten         1.710000   0.010000   1.720000 (  1.701329)
     Set#each {|e| e}    1.690000   0.000000   1.690000 (  1.691799)

its roughly ~ the same for max compatibility (e.g. has the @hash instance-var on Set instances)

@kares
Copy link
Member Author

kares commented Jun 20, 2018

has been delivered in 9.2.0.0

@kares kares closed this as completed Jun 20, 2018
@kares kares added this to the JRuby 9.2.0.0 milestone Jun 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants