Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixnum/Float 'fast-ops' working even when re-opened #4736

Merged
merged 12 commits into from
Jun 27, 2019

Conversation

kares
Copy link
Member

@kares kares commented Aug 9, 2017

instead of relying on isFixnumReopened and similar
we check at call-site whether the cached method isBuiltin

while its slightly slower it does not degrade whenever the Fixnum/Float gets re-opened ...
... thus also works (fast-opt path being triggered) with active_support/core_ext being loaded

for polymorphic style behaviour (e.g. with +) deciding on Fixnum/Float argument there's now 2 caches used so it should play nicely (not invalidate) if a site does 1+1, 1+1.0, 1+1 in a row ... let me know if its premature :)

some numbers

... really just some quick "bmbm" runs to confirm its getting better and also not getting worse :)

  1. previous (no AS loaded)
Rehearsal ----------------------------------------------------------------
1 + 2 + 12 + 99 [100000000x]   3.970000   0.000000   3.970000 (  3.582797)
2 + 1 [100000000x]             2.990000   0.000000   2.990000 (  2.864492)
------------------------------------------------------- total: 6.960000sec

                                   user     system      total        real
1 + 2 + 12 + 99 [100000000x]   3.320000   0.000000   3.320000 (  3.314471)
2 + 1 [100000000x]             2.530000   0.010000   2.540000 (  2.535214)
  1. before (AS loaded)
Rehearsal ----------------------------------------------------------------
1 + 2 + 12 + 99 [100000000x]   6.330000   0.010000   6.340000 (  6.136795)
2 + 1 [100000000x]             3.490000   0.000000   3.490000 (  3.313757)
------------------------------------------------------- total: 9.830000sec

                                   user     system      total        real
1 + 2 + 12 + 99 [100000000x]   5.930000   0.000000   5.930000 (  5.920239)
2 + 1 [100000000x]             3.090000   0.000000   3.090000 (  3.091332)
  1. after the change (AS loaded)
Rehearsal ----------------------------------------------------------------
1 + 2 + 12 + 99 [100000000x]   4.640000   0.010000   4.650000 (  4.236394)
2 + 1 [100000000x]             3.030000   0.000000   3.030000 (  2.839032)
------------------------------------------------------- total: 7.680000sec

                                   user     system      total        real
1 + 2 + 12 + 99 [100000000x]   4.150000   0.000000   4.150000 (  4.107868)
2 + 1 [100000000x]             2.680000   0.000000   2.680000 (  2.683514)
  1. after the change (AS not loaded)
Rehearsal ----------------------------------------------------------------
1 + 2 + 12 + 99 [100000000x]   4.630000   0.000000   4.630000 (  4.137725)
2 + 1 [100000000x]             3.090000   0.000000   3.090000 (  2.986892)
------------------------------------------------------- total: 7.720000sec

                                   user     system      total        real
1 + 2 + 12 + 99 [100000000x]   3.760000   0.010000   3.770000 (  3.769681)
2 + 1 [100000000x]             2.850000   0.000000   2.850000 (  2.784346)

@kares kares requested a review from headius August 9, 2017 11:17
@kares kares added this to the JRuby 9.2.0.0 milestone Aug 9, 2017
Copy link
Member

@headius headius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch seems pretty straightforward. I guess you added a second cache so you wouldn't be blowing out the Fixnum cache with Float values? I'm unclear on that change but the rest is fine.

@kares
Copy link
Member Author

kares commented Aug 17, 2017

I guess you added a second cache so you wouldn't be blowing out the Fixnum cache with Float values? I'm unclear on that change but the rest is fine.

exactly seemed like it would make sense since we manage that call-site manually - we know when a Fixnum and when a Float class is to be expected (unlike if it were a generic - truly polymorphic cache).

@headius
Copy link
Member

headius commented Aug 18, 2017

Seems ok to me then. I see there's no patch for indy, so that might be some additional work worth doing.

@kares
Copy link
Member Author

kares commented Aug 18, 2017

thanks, will merge this one than ... will check what I can do about indy but no promises :)

@kares
Copy link
Member Author

kares commented Aug 21, 2017

so I see what you've meant - the fixnum/float invalidators are still based on Fixnum/Float re-opening logic.
good enough challenge for me to finally write some indy code ... but what can be done is doing the built-in check in a method-handle guard. I'm on it.

btw. had a related/similar question on my mind for a while - it seems a bit more complicated but if we wanted to get rid of the isBuiltin check completely - would need to observe changes to a RubyClass -> could dispatch to listeners with details every time a method is added/removed. what's the flaw in general with such an idea?

@enebo
Copy link
Member

enebo commented Aug 21, 2017

@kares early on we tried that without much success (we called it active invalidation at the time). There are a couple of issues: 1) eager invalidation being extra execution cost for code never hit again 2) concurrency consistency (unless I lock down the world and one site updated but not the next one before it fires the wrong method version). Some inconsistency is just indeterminism of threading but I think this sort of invalidation goes beyond that.

A second point for invokedynamic with invalidation is switchpoint invalidation is incredibly cheap.

@headius
Copy link
Member

headius commented Aug 23, 2017

I did experiment with including a switchpoint inside each method object, so that if that method were ever overwritten we could do a one-shot invalidation of all sites that might have cached it. This would reduce the call site churn caused by invalidating a single class and having all methods cached from it get kicked out, even if they're still valid.

I don't recall exactly what problems I ran into but I do remember that determining when to invalidate was tricky. You need to invalidate a method if any subclass overrides it, which meant all new methods had to walk up the hierarchy looking for the method they might be shadowing and then invalidate that method. This ended up being a very high startup cost even if it might save us walking down the hierarchy in the current model.

It's something to consider, though...there may still be an efficient way to invalidate all call sites containing a given method without invalidating the whole class.

@headius
Copy link
Member

headius commented Aug 29, 2017

I pushed something similar to my experiment from years ago to the active_invalidation branch. The strategy there works a bit differently than I described above:

  • Each CacheEntry, which is immutable, gets a new, valid SwitchPoint.
  • When redefining a method, a search is done for that method name and any cached entry is invalidated.
  • All uses of CacheEntry check that it is still valid before proceeding.

This would eliminate the down-hierarchy search required to invalidate all descendant classes, replacing them with a usually-shorter and finer-grained search up the hierarchy for a single cache entry.

It would also mean we are invalidating methods based both on the class they're in and the name of the method, reducing the impact on unrelated method names.

However, it doesn't pass. I haven't dug into the failures.

@enebo
Copy link
Member

enebo commented Aug 29, 2017

@headius you are using the same term but it is different that what @kares suggested and what I described about the past. This is what I would consider fine-grained inactive invalidation since cacheentry would only invalidate and replace upon getting called after the method has changed. Perhaps we need different vocabulary? method changing and notifying all sites (old days inactive) is probably better called eager. What we do today is lazy. What your branch does is method-grained and what we do today is type-grained. This last distinction because runtime-grained exists and no doubt other interesting subsets. Seems a fair description?

@headius
Copy link
Member

headius commented Sep 5, 2017

@enebo Yes, that's not a bad description.

My branch was an attempt to make invalidation work like it does today, but on a method slot basis. So only call sites that cache a method that gets replaced or overridden would invalidate.

Granularity-wise, this is close to the finest grain we can invalidate. Where currently we invalidate only based on the serial number of the class (and method changes spin all serials below), this additionally separates invalidation based on the name and where the class and method are in the hierarchy (e.g. invalidating foo in a parent class doesn't need to invalidate children that never saw it).

@headius
Copy link
Member

headius commented Oct 11, 2018

@kares This is marked for 9.2.1. Can it be updated?

@kares kares force-pushed the test-call-site-2 branch 3 times, most recently from 0764282 to ff52286 Compare October 17, 2018 10:27
@headius headius modified the milestones: JRuby 9.2.1.0, JRuby 9.2.2.0 Nov 1, 2018
@enebo enebo removed this from the JRuby 9.2.5.0 milestone Dec 6, 2018
@kares kares requested review from headius and enebo June 5, 2019 11:21
Copy link
Member

@enebo enebo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two level cache class and method names have bugged me. The 2 represents looking at second cache instead of 1 but it looks like 2 as in rb_new_ary2. Could the three of us maybe come up with another naming convention here?

TwoLevelCachingCallsite? isbuiltin2 is one I don't have any idea on...

@kares
Copy link
Member Author

kares commented Jun 6, 2019

since NormalCachingCallSite is kind of a MonomorphicCachingCallSite
thus in that sense NormalCachingCallSite2 could become a BimorphicCachingCallSite
or a BiNormalCachingCallSite ... not sure if that sounds weird or not 😉

the class is an internal base class - naming is more of a reflection on having cache2 (along w cache)
so all 'duplicate' methods simply got the 2 suffix - haven't had a better idea wout refactoring NormalCachingCallSite extensively ... but I am open to suggestions

@kares
Copy link
Member Author

kares commented Jun 6, 2019

as promised ... here's some numbers: https://gist.github.com/kares/7188d075d614ed023ccad31cf6d968bb
for indy=true it might not be a "fair" comparison since when (Float/Fixnum) re-opened we always invalidate, but since its common e.g. in a Rails env or even when loading some stdlib (bigdecimal/util.rb) parts

@enebo
Copy link
Member

enebo commented Jun 6, 2019

@kares re-naming...yeah I am having a hard time with a name here too. Perhaps @headius will get inspiration. 2 can be Bi or Two or Dual. The method itself is more challenging to me too :(

@kares It is neat that math will be faster in Rails apps with this. I wonder if we do much math in ARJDBC to ever notice this? (not that it matters I just hope we see a positive effect there anyways).

}

public static boolean fixnumBooleanFail(ThreadContext context, IRubyObject caller, IRubyObject self, JRubyCallSite site, RubyFixnum value) throws Throwable {
static boolean fixnumTestWithGen(IRubyObject self, RubyClass klass, final int gen) {
return self instanceof RubyFixnum && klass.getGeneration() == gen;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice addition but ideally the indy way would be to use a switch point rather than a generation check. The generation check will always require traversing the class and the field, where the switch point is compiled as a safe point with basically zero overhead.

This is fine for now; we have continued to flip the generation since many other places can't use switch point, but the latter will optimize better.

Copy link
Member Author

@kares kares Jun 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was an addition on top of the method invalidation, as explained:
https://github.com/jruby/jruby/pull/4736/files#r297135999
we can get rid of it and see how it goes.

target = ((SwitchPoint)context.runtime.getFloatInvalidator().getData())
.guardWithTest(target, fallback);

target = ((SwitchPoint) classFloat.getInvalidator().getData()).guardWithTest(target, fallback);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit confusing to me...why do we need to check both the SwitchPoint and the generation? They should both invalidate on class hierarchy changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, will remove.
this is smt I wasn't sure about due the comment history where it was noted that the callback invalidation approach wasn't working in the past well. so I did the generation check as well as an attempt to invalidate methods "early" (besides the class-level method invalidator)

Copy link
Member

@headius headius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No objections for merging. Only concern is with the use of both generation and switchpoint to invalidate the math operations. Unless I've missed some detail, I think only one should be required.

@headius
Copy link
Member

headius commented Jun 24, 2019

I have submitted a technical review. As for the naming...I'm with @enebo that "2" is not descriptive enough.

I'm thinking along these lines:

NormalCachingCallSite becomes common abstract parent to MonomorphicCallSite and BimorphiCallSite (caching seems irrelevant at this level) or goes away entirely (binary compat issue; perhaps it just becomes an empty class).

MonomorphicCallSite keeps the methods named without any numbering, since those are definitely used inside and outside of JRuby.

BimorphicCallSite methods are trickier. I would think of the second cache entry as "secondary" but maybe that's a little long? isSecondaryBuiltin, retrieveSecondaryCache, cacheAndGetSecondary...maybe it's not too bad for code that's intended only to be used internally?

@kares
Copy link
Member Author

kares commented Jun 25, 2019

okay, PR feedback should be addressed by now,
will need to do a rebase due OracleJDK11 failing CI + a conflict resolution but otherwise hopefully all good ...

@kares kares requested a review from enebo June 25, 2019 20:13
@kares kares force-pushed the test-call-site-2 branch from a6c878c to 563106a Compare June 26, 2019 06:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants