-
-
Notifications
You must be signed in to change notification settings - Fork 924
JRuby counterpart for RubyVM.stat #4384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Looks like RubyVM.stat produces a hash of three values: global_method_state, global_constant_state, and class_serial. On JRuby there's no equivalent for global_method_state since we do not invalidate methods globally. However, this could be emulated by simply adding another global serial number; we would not use it internally, but it would indicate how many invalidation events had occurred. Similarly there's no one global_constant_state because we invalidate constants on a per-name basis (MRI may do something similar, so I'm curious what that value means for them). This could also be simulated by adding an additional serial number. And I'm guessing here, but I think class_serial is basically just a monotonically-increasing class ID, so you can see if any new classes have been added. This exists in JRuby, but unfortunately it is not exposed right now. The best you'd be able to do would be to define a new class and confirm its ID is one higher than the previous value. Note also that what does exist is all accessible from Ruby today, since JRuby can reflect back into itself: p JRuby.runtime.alloc_module_id # n
Class.new # two classes created
p JRuby.runtime.alloc_module_id # n + 3 |
It might be helpful, but after experimenting a little, it seems like in MRI's case the class serial is the one most helpful for us. Take this class: class C
def foo
end
end Both of the following snippets increase class C
def bar
end
end def C.baz
123
end
This is equivalent to checking But while it might be "good enough" for our use case of "all methods cache" invalidation, it doesn't work like irb(main):011:0> JRuby.runtime.allocModuleId
=> 1605
irb(main):012:0> class C
irb(main):013:1> def bar
irb(main):014:2> end
irb(main):015:1> end
=> nil
irb(main):016:0> JRuby.runtime.alloc_module_id
=> 1608
irb(main):017:0> class C
irb(main):018:1> def bar
irb(main):019:2> 3
irb(main):020:2> end
irb(main):021:1> end
=> nil
irb(main):022:0> JRuby.runtime.alloc_module_id
=> 1609
...
irb(main):035:0> JRuby.runtime.alloc_module_id
=> 1622
irb(main):036:0> def C.baz
irb(main):037:1> asdasd
irb(main):038:1> end
=> nil
irb(main):039:0> JRuby.runtime.alloc_module_id
=> 1623 |
There's a simple reason for this: in order to reduce cache thrashing on singleton objects, we do not assign a new ID to a singleton class until it defines a method. The justification here is that the singleton class is indistinguishable from the parent class for method-caching purposes (the main reason for this ID) so there's no need to invalidate call sites that have seen the parent. And not to put too fine an edge on it, but you're depending on very internal behavior of both implementations. You're lucky we still even have the class ID...we don't need it these days. The down side of incrementing like MRI is not clear. I don't know how common it is to create a new singleton object but never define methods or include modules into it. Your case here is pretty narrow. |
FWIW a this point we could add something like |
Are you referring to the In the first example, the problem is that redefining a method doesn't change the alloc id. Neither does this happen if I reopen an existing class and define a new method.
Sorry, which of the examples are you referring to?
The only thing we're depending on is that We're not depending on the internal structure of the returned value. |
Why not add a |
You know what, you're right, and that is odd. I would have expected it to change. For some reason I read your example like
Well, you're depending on that change happening for the same events across runtimes. JRuby goes much farther than MRI in optimizing some of these areas, and we won't invalidate method caches nearly as often. That's why it only seems feasible to me to provide the class ID, since we already have that concept for cache identities. If you're ok with the limitation that only visibly different classes will trigger a new ID, then what we have today should work fine (after I figure out your singleton example).
Indeed, that's an alternative. Of course, you managed to get at the ID without us adding anything...that's by design. What I'd prefer is a standard API that both JRuby and MRI implement. Until then, I'm willing to add blessed interfaces for some of these VM stats to avoid having you bind to internal APIs. Probably under JRuby::VM. |
I took a second look at the code and I have an explanation: we eagerly create the class's singleton class at construction time. So in this case, I believe it's working properly: no new class has been created. MRI may defer singleton class creation until it is needed. |
Rather, I'm depending on it on recent MRI, and hoping other runtimes will adopt something similar.
When a class has new method added, or a method definition is changed, don't you have to invalidate some cache anyway?
What do you mean by "visibly different"? I think both of my examples result in visibly different classes. Here's another one: irb(main):001:0> JRuby.runtime.alloc_module_id
=> 1599
irb(main):002:0> class C
irb(main):003:1> def foo
irb(main):004:2> end
irb(main):005:1> end
=> nil
irb(main):006:0> JRuby.runtime.alloc_module_id
=> 1602
irb(main):007:0> class C
irb(main):008:1> def bar
irb(main):009:2> end
irb(main):010:1> end
=> nil
irb(main):011:0> JRuby.runtime.alloc_module_id
=> 1603 The class
That might be good enough if the counter got incremented more often. In particular, I'd like to be confident that it would happen when a user types out a new method and re-loads the source file containing it. Having any slight edit of existing methods result in counter increment as well would be ideal, but not necessary so far, for me.
The MRI team might be reluctant to standardize the stat structure anyway. The doc says it can change in future releases. |
Visibly difference from its parent class. We assign a new ID to a class exactly once, when it begins to differ from its parent. From then on it lives as its own class for caching purposes. Adding I think some clarification is needed here. JRuby uses this module ID only to uniquely identify classes. On rare occasions, two different class objects will have the same ID if they do not yet differ. Once one of them differs, it will get its own ID exactly once. This ID is used for type-checking: to see if the incoming receiver object is the same as it was last time (or visibly indistinguishable from what we saw last time). In your case, adding
We also have a separate mechanism that tracks method tables. Each class has its own, separate serial number. This is an indication of the class's current "revision" if you will. Adding So in order to make all the cases you want visible, we'd need to inspect every class in the system and derive from that whether anything had changed. Or add in one more serial number that we don't use, but which you want because it's how MRI does things in an internal, unofficial API. And that's what I mean by depending on implementation-specific details.
Ok, we're on the same page about having our own JRuby-specific VM namespace for these sorts of internal details. However, like RubyVM, I'm not really comfortable blessing such an interface as a supported API, since our internal details have and will change. I think we can come up with something appropriate for your use...but I don't feel like I understand all the cases you want to cover yet. Just asking for something "like MRI" is a bit vague, because we are similar in some ways and very, very different in others. I'm not sure the stat values MRI exposes directly map to anything in JRuby, but parts of them map to parts of us. |
That seems to be the main point. Thanks.
I imagine you also have to check there that the receiver's method table hasn't changed. So JRuby checks two counters (or does something similar), while MRI gets away with checking just one. I suppose it's a tradeoff between having a quicker validation check and invalidating fewer callsites when a method is (re)defined.
One more serial number would be ideal, but exposing method cache ids on each module might also work fast enough (I'd have to test), as well as provide a more correct invalidation information. I'd need to know a good way to compute a cache key from them, though. Would summing them up be fast enough? Or if the method cache ids were to be made always-increasing and unique across all modules, I could just take a
I disagree with that characterization because, even if JRuby undergoes architecture changes in the future, it should still be possible to provide the stat information that would be useful for our purposes. Even if the contents of the hash will have to change somehow. |
Allow me to copy-paste from the first message in this issue: We want to have a fast way to determine whether the set of modules, classes and methods in the current runtime didn't change since the last time a certain method was invoked. That's basically the only requirement. So we'll have a certain structure that is derived from all methods in all classes defined in the current runtime. And we want to know when the said structure needs to be rebuilt to accurately reflect the current methods and classes. We'll use it to make code completion faster in Pry (in the "call target is unknown" case). I also anticipate using it in another, non-REPL package, also for code completion. I'm not sure if there are any uses of |
By saying we check these values I was oversimplifying a bit. In both cases, the check is not actively done; it works with what's called a "safepoint" in the JVM. Normally, if nothing changes, there's no overhead. If something changes, the code gets invalidated and never run again, and code already running has to back off. So while we have finer-grained tracking of system changes, it actually translates into much less overhead than MRI.
Sure, but that's exactly what I mean by impl-specific. The contents of that hash reflect how our runtimes work internally. If you're fine with keeping up with our internal changes, and having that hash potentially change across JRuby versions, then it's fine. At this point I think having a single global serial number is probably the best way to go. All places where we tickle fine-grained invalidation state would also want to bump this other value. @enebo You have any thoughts on this? |
Thanks. That got me re-reading your Invokedynamic blog entry.
Yes, I'm fine with it. But I'm hoping that even with the contents of the hash changing between versions of MRI and JRuby, I won't have to change my code. As long as the hash as a whole can serve as a change canary. |
And a (EDIT:)
I'd just expect a certain relation: if a new class id is allocated, the counter must change as well. |
If this gives some value and is pseudo compatible I am ok with this. It should not be hard to add this behavior. |
Ok, so if we're willing to go forward with this...it's design time. Do we want to just add one new counter? Or should we take advantage of the fact that we don't actually use the counter values to add a few different ones. I'm thinking method table modifications, constant table modifications, current highest-known class ID at least. If we're going to add counters wouldn't it be better to expose different changes? |
@headius so long as the design supports adding new values we can start simple with obvious things but is this compatible codewise with how MRI supports it? If not should we even use the same API. I guess I am interested in somewhat working without people have to have 2 code paths but at the same time if we expose different stats perhaps we should expose them in a clean way and not try to be as similar to MRI API? I am just asking questions... |
I think two code paths will be fine. Certainly better than the current situation anyway. It's basically +2 lines for my usage. And we have to run the code in JRuby on CI either way. |
@headius EuRuKo ping! |
Ok, with a maintenance release out of the way, perhaps we can circle back to this one... It sounds like the best option would be to provide our own APIs to access method cache information, and to add a global method serial number that can be used to track all method cache invalidations. It turns out that we actually did have this functionality in a different form, for a while. The I will look at restoring this counter, making it track individual validations, and exposing it in an appropriate way. |
@dgutov To clarify...you are only interested in knowing when any method change happened, right? I may not have made clear above, it is possible to determine if an individual class has changed using current features.
Does this align with MRI? Normally in JRuby, simply creating a new class does not invalidate any method caches, but may invalidate some constant caches when the class is assigned to a constant. Simply put: if a class is created but overrides no methods, there's nothing to invalidate. Nobody has called its methods yet, and they'll just be inherited from the parent. If what you are interested in is global method changes, we probably wouldn't include the creation of a new class. If what you really want is a global "SOMETHING CHANGED SOMEWHERE" serial number, that's doable too. I'd prefer to separate constant cache information from method cache information, personally. |
First and foremost, yes. IOW, I want a way to invalidate an "all methods in all classes" cache without going through
Yes. It increases the value of the
If it defines nor overrides no methods, it's not really interesting for my intended use case. So maybe do what more convenient, implementation-wise.
IIUC, they are separate keys in MRI. It has three serials in total. We're using that hash as a single cache key in this particular use case, but having them separate adds some visibility into the internals of the VM, which is always a plus. |
Ahh ok I think I get it. So you're using the hash itself as your indication of change in the system, and so having methods and classes and such be different counters is fine. I think I'm on board now. |
Ok some notes before I try to impl this...
|
Looking into this today. Another note: here's the commit when MethodCache was removed from our mbeans: e1be5e7 I'll basically be restoring it, but as "Caches" this time. |
This is for #4384. For this initial version, methods and constants only have a single value indicating validation count globally. For constants, this will aggregate all constant names (which are invalidated separately). For methods, this will aggregate all class hierarchy invalidations, which will frequently mean a single invalidation event will increment this value many times, depending on the size of the hierarchy below it. This latter point could be improved but we don't really have a clear place to say "this is the top invalidation" at the moment.
Ok, so 0e6efdd added the Caches mbean, and it appears to be working. The screenshot below is from VisualVM and JRuby running Since we don't have a RubyVM namespace, we need to decide on a home for this. Currently, you could reach it using |
With the addition of the Caches mbean, this is largely done, but we need to decide on a suitable API. |
In the absence of a better suggestion, I have added JRuby::Util::cache_stats:
I believe this provides everything you need. It also tells me we should look into flattening high-level invalidations better, because we're invalidating nearly 900 classes for this one def. Reopening a user class has the invalidation result you'd expect, since there's no descendants to invalidate:
This is still considered an internal API subject to change, but if y'all have better naming suggestions it would help convince us to support this as an official thing. |
Thanks, Charles!
I don't have better naming suggestions (this one seems fine), but hopefully a diagnostic benefit like this will convince you to keep it around anyway. ;-) I'll try it when it's released, since there doesn't seem to be any ruby-build recipe for this branch. BTW, there seems to be a difference between what MRI does and what you do. These numbers count every invalidation, and MRI keeps counters that invalidations are based on, I think (maybe they're performed lazily). Both should be fine for my purposes, but maybe keep this in mind if counting invalidations starts to be a performance concern. |
Well, it's released. I guess this is what it will be named 😁 |
Looks good. Thank you! |
Or its JRuby counterpart (jruby/jruby#4384). Truffle doesn't seem to have anything comparable, so this breaks compatibility.
Environment
Expected Behavior
We want to have a fast way to determine whether the set of modules, classes and methods in the current runtime didn't change since the last time a certain method was invoked. One can do that in MRI 2.1 and newer by comparing the return values of
RubyVM.stat
. Is there a JRuby-specific way to do that, aside from doing a fullObjectSpace
scan?Here's the current use case: pry/pry#1583
The text was updated successfully, but these errors were encountered: