Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JRuby counterpart for RubyVM.stat #4384

Closed
dgutov opened this issue Dec 14, 2016 · 33 comments
Closed

JRuby counterpart for RubyVM.stat #4384

dgutov opened this issue Dec 14, 2016 · 33 comments

Comments

@dgutov
Copy link

dgutov commented Dec 14, 2016

Environment

jruby 1.7.26 (1.9.3p551) 2016-08-26 69763b8 on OpenJDK 64-Bit Server VM 1.8.0_111-8u111-b14-2ubuntu0.16.04.2-b14 +jit [linux-amd64]

Expected Behavior

We want to have a fast way to determine whether the set of modules, classes and methods in the current runtime didn't change since the last time a certain method was invoked. One can do that in MRI 2.1 and newer by comparing the return values of RubyVM.stat. Is there a JRuby-specific way to do that, aside from doing a full ObjectSpace scan?

Here's the current use case: pry/pry#1583

@headius
Copy link
Member

headius commented Dec 14, 2016

Looks like RubyVM.stat produces a hash of three values: global_method_state, global_constant_state, and class_serial.

On JRuby there's no equivalent for global_method_state since we do not invalidate methods globally. However, this could be emulated by simply adding another global serial number; we would not use it internally, but it would indicate how many invalidation events had occurred.

Similarly there's no one global_constant_state because we invalidate constants on a per-name basis (MRI may do something similar, so I'm curious what that value means for them). This could also be simulated by adding an additional serial number.

And I'm guessing here, but I think class_serial is basically just a monotonically-increasing class ID, so you can see if any new classes have been added. This exists in JRuby, but unfortunately it is not exposed right now. The best you'd be able to do would be to define a new class and confirm its ID is one higher than the previous value.

Note also that what does exist is all accessible from Ruby today, since JRuby can reflect back into itself:

p JRuby.runtime.alloc_module_id # n
Class.new # two classes created
p JRuby.runtime.alloc_module_id # n + 3

@dgutov
Copy link
Author

dgutov commented Dec 14, 2016

However, this could be emulated by simply adding another global serial number; we would not use it internally, but it would indicate how many invalidation events had occurred.

It might be helpful, but after experimenting a little, it seems like in MRI's case the class serial is the one most helpful for us. Take this class:

class C
  def foo
  end
end

Both of the following snippets increase class_serial but not global_method_state:

class C
  def bar
  end
end
def C.baz
  123
end

The best you'd be able to do would be to define a new class and confirm its ID is one higher than the previous value.

This is equivalent to checking JRuby.runtime.alloc_module_id, right? Which is a good recommendation, thanks.

But while it might be "good enough" for our use case of "all methods cache" invalidation, it doesn't work like class_serial: the snippets above don't affect it.

irb(main):011:0> JRuby.runtime.allocModuleId
=> 1605
irb(main):012:0> class C
irb(main):013:1> def bar
irb(main):014:2> end
irb(main):015:1> end
=> nil
irb(main):016:0> JRuby.runtime.alloc_module_id
=> 1608
irb(main):017:0> class C
irb(main):018:1> def bar
irb(main):019:2> 3
irb(main):020:2> end
irb(main):021:1> end
=> nil
irb(main):022:0> JRuby.runtime.alloc_module_id
=> 1609
...
irb(main):035:0> JRuby.runtime.alloc_module_id
=> 1622
irb(main):036:0> def C.baz
irb(main):037:1> asdasd
irb(main):038:1> end
=> nil
irb(main):039:0> JRuby.runtime.alloc_module_id
=> 1623

@headius
Copy link
Member

headius commented Jan 6, 2017

There's a simple reason for this: in order to reduce cache thrashing on singleton objects, we do not assign a new ID to a singleton class until it defines a method. The justification here is that the singleton class is indistinguishable from the parent class for method-caching purposes (the main reason for this ID) so there's no need to invalidate call sites that have seen the parent.

And not to put too fine an edge on it, but you're depending on very internal behavior of both implementations. You're lucky we still even have the class ID...we don't need it these days.

The down side of incrementing like MRI is not clear. I don't know how common it is to create a new singleton object but never define methods or include modules into it. Your case here is pretty narrow.

@headius
Copy link
Member

headius commented Jan 6, 2017

FWIW a this point we could add something like RubyVM.stat that just exposes class_id, but I'm very reluctant to define RubyVM at all, since I know there's code out there using to detect it's running on MRI, including parts of stdlib itself.

@dgutov
Copy link
Author

dgutov commented Jan 6, 2017

There's a simple reason for this: in order to reduce cache thrashing on singleton objects, we do not assign a new ID to a singleton class until it defines a method.

Are you referring to the def C.baz example? But it defines a method (I imagine, the first method in the singleton class). And yet, alloc_module_id doesn't change (or rather only changes by 1 because I call it directly).

In the first example, the problem is that redefining a method doesn't change the alloc id. Neither does this happen if I reopen an existing class and define a new method.

I don't know how common it is to create a new singleton object but never define methods or include modules into it. Your case here is pretty narrow.

Sorry, which of the examples are you referring to?

And not to put too fine an edge on it, but you're depending on very internal behavior of both implementations.

The only thing we're depending on is that RubyVM.stat returns a value that changes if some new class or method or added or changed. That's a pretty clear idea, I think.

We're not depending on the internal structure of the returned value.

@dgutov
Copy link
Author

dgutov commented Jan 6, 2017

FWIW a this point we could add something like RubyVM.stat that just exposes class_id, but I'm very reluctant to define RubyVM at all, since I know there's code out there using to detect it's running on MRI, including parts of stdlib itself.

Why not add a JRubyVM.stat instead?

@headius
Copy link
Member

headius commented Jan 7, 2017

Are you referring to the def C.baz example? But it defines a method (I imagine, the first method in the singleton class). And yet, alloc_module_id doesn't change (or rather only changes by 1 because I call it directly).

You know what, you're right, and that is odd. I would have expected it to change. For some reason I read your example like class << C or something, which would not cause a new ID if no method table changes were present.

The only thing we're depending on is that RubyVM.stat returns a value that changes if some new class or method or added or changed. That's a pretty clear idea, I think.

Well, you're depending on that change happening for the same events across runtimes. JRuby goes much farther than MRI in optimizing some of these areas, and we won't invalidate method caches nearly as often. That's why it only seems feasible to me to provide the class ID, since we already have that concept for cache identities.

If you're ok with the limitation that only visibly different classes will trigger a new ID, then what we have today should work fine (after I figure out your singleton example).

Why not add a JRubyVM.stat instead?

Indeed, that's an alternative. Of course, you managed to get at the ID without us adding anything...that's by design.

What I'd prefer is a standard API that both JRuby and MRI implement. Until then, I'm willing to add blessed interfaces for some of these VM stats to avoid having you bind to internal APIs. Probably under JRuby::VM.

@headius
Copy link
Member

headius commented Jan 7, 2017

You know what, you're right, and that is odd. I would have expected it to change. For some reason I read your example like class << C or something, which would not cause a new ID if no method table changes were present.

I took a second look at the code and I have an explanation: we eagerly create the class's singleton class at construction time. So in this case, I believe it's working properly: no new class has been created. MRI may defer singleton class creation until it is needed.

@dgutov
Copy link
Author

dgutov commented Jan 7, 2017

Well, you're depending on that change happening for the same events across runtimes.

Rather, I'm depending on it on recent MRI, and hoping other runtimes will adopt something similar. RubyVM is strictly MRI-specific.

JRuby goes much farther than MRI in optimizing some of these areas, and we won't invalidate method caches nearly as often.

When a class has new method added, or a method definition is changed, don't you have to invalidate some cache anyway?

If you're ok with the limitation that only visibly different classes will trigger a new ID

What do you mean by "visibly different"? I think both of my examples result in visibly different classes. Here's another one:

irb(main):001:0> JRuby.runtime.alloc_module_id
=> 1599
irb(main):002:0> class C
irb(main):003:1> def foo
irb(main):004:2> end
irb(main):005:1> end
=> nil
irb(main):006:0> JRuby.runtime.alloc_module_id
=> 1602
irb(main):007:0> class C
irb(main):008:1> def bar
irb(main):009:2> end
irb(main):010:1> end
=> nil
irb(main):011:0> JRuby.runtime.alloc_module_id
=> 1603

The class C got "visibly different" (new method added), yet the counter did not increase.

Of course, you managed to get at the ID without us adding anything...that's by design.

That might be good enough if the counter got incremented more often. In particular, I'd like to be confident that it would happen when a user types out a new method and re-loads the source file containing it.

Having any slight edit of existing methods result in counter increment as well would be ideal, but not necessary so far, for me.

What I'd prefer is a standard API that both JRuby and MRI implement.

The MRI team might be reluctant to standardize the stat structure anyway. The doc says it can change in future releases.

@headius
Copy link
Member

headius commented Jan 7, 2017

What do you mean by "visibly different"? I think both of my examples result in visibly different classes. Here's another one:

Visibly difference from its parent class. We assign a new ID to a class exactly once, when it begins to differ from its parent. From then on it lives as its own class for caching purposes.

Adding bar to C doesn't change class identity.

I think some clarification is needed here.

JRuby uses this module ID only to uniquely identify classes. On rare occasions, two different class objects will have the same ID if they do not yet differ. Once one of them differs, it will get its own ID exactly once.

This ID is used for type-checking: to see if the incoming receiver object is the same as it was last time (or visibly indistinguishable from what we saw last time). In your case, adding bar to the class does not change the type identity of existing C objects.

When a class has new method added, or a method definition is changed, don't you have to invalidate some cache anyway?

We also have a separate mechanism that tracks method tables. Each class has its own, separate serial number. This is an indication of the class's current "revision" if you will. Adding bar to C in your example will cause this number to be incremented. Each class has its own number, unlike in MRI where there's only one global serial number.

So in order to make all the cases you want visible, we'd need to inspect every class in the system and derive from that whether anything had changed. Or add in one more serial number that we don't use, but which you want because it's how MRI does things in an internal, unofficial API.

And that's what I mean by depending on implementation-specific details.

Rather, I'm depending on it on recent MRI, and hoping other runtimes will adopt something similar. RubyVM is strictly MRI-specific.

Ok, we're on the same page about having our own JRuby-specific VM namespace for these sorts of internal details. However, like RubyVM, I'm not really comfortable blessing such an interface as a supported API, since our internal details have and will change.

I think we can come up with something appropriate for your use...but I don't feel like I understand all the cases you want to cover yet. Just asking for something "like MRI" is a bit vague, because we are similar in some ways and very, very different in others. I'm not sure the stat values MRI exposes directly map to anything in JRuby, but parts of them map to parts of us.

@dgutov
Copy link
Author

dgutov commented Jan 7, 2017

We assign a new ID to a class exactly once

That seems to be the main point. Thanks.

This ID is used for type-checking: to see if the incoming receiver object is the same as it was last time (or visibly indistinguishable from what we saw last time). In your case, adding bar to the class does not change the type identity of existing C objects.

I imagine you also have to check there that the receiver's method table hasn't changed. So JRuby checks two counters (or does something similar), while MRI gets away with checking just one. I suppose it's a tradeoff between having a quicker validation check and invalidating fewer callsites when a method is (re)defined.

So in order to make all the cases you want visible, we'd need to inspect every class in the system and derive from that whether anything had changed. Or add in one more serial number that we don't use, but which you want because it's how MRI does things in an internal, unofficial API.

One more serial number would be ideal, but exposing method cache ids on each module might also work fast enough (I'd have to test), as well as provide a more correct invalidation information.

I'd need to know a good way to compute a cache key from them, though. Would summing them up be fast enough? Or if the method cache ids were to be made always-increasing and unique across all modules, I could just take a max.

And that's what I mean by depending on implementation-specific details.

I disagree with that characterization because, even if JRuby undergoes architecture changes in the future, it should still be possible to provide the stat information that would be useful for our purposes. Even if the contents of the hash will have to change somehow.

@dgutov
Copy link
Author

dgutov commented Jan 7, 2017

I think we can come up with something appropriate for your use...but I don't feel like I understand all the cases you want to cover yet. Just asking for something "like MRI" is a bit vague, because we are similar in some ways and very, very different in others.

Allow me to copy-paste from the first message in this issue:

We want to have a fast way to determine whether the set of modules, classes and methods in the current runtime didn't change since the last time a certain method was invoked.

That's basically the only requirement. So we'll have a certain structure that is derived from all methods in all classes defined in the current runtime. And we want to know when the said structure needs to be rebuilt to accurately reflect the current methods and classes.

We'll use it to make code completion faster in Pry (in the "call target is unknown" case). I also anticipate using it in another, non-REPL package, also for code completion.

I'm not sure if there are any uses of RubyVM.stat out there, aside from diagnosing performance problems on MRI. If something comes up, we could add those as additional requirements. All I've found so far as "MRI internals" articles and libs like https://github.com/simeonwillbanks/busted.

@headius
Copy link
Member

headius commented Jan 9, 2017

I imagine you also have to check there that the receiver's method table hasn't changed. So JRuby checks two counters (or does something similar), while MRI gets away with checking just one. I suppose it's a tradeoff between having a quicker validation check and invalidating fewer callsites when a method is (re)defined.

By saying we check these values I was oversimplifying a bit. In both cases, the check is not actively done; it works with what's called a "safepoint" in the JVM. Normally, if nothing changes, there's no overhead. If something changes, the code gets invalidated and never run again, and code already running has to back off. So while we have finer-grained tracking of system changes, it actually translates into much less overhead than MRI.

I disagree with that characterization because, even if JRuby undergoes architecture changes in the future, it should still be possible to provide the stat information that would be useful for our purposes. Even if the contents of the hash will have to change somehow.

Sure, but that's exactly what I mean by impl-specific. The contents of that hash reflect how our runtimes work internally. If you're fine with keeping up with our internal changes, and having that hash potentially change across JRuby versions, then it's fine.

At this point I think having a single global serial number is probably the best way to go. All places where we tickle fine-grained invalidation state would also want to bump this other value.

@enebo You have any thoughts on this?

@dgutov
Copy link
Author

dgutov commented Jan 10, 2017

So while we have finer-grained tracking of system changes, it actually translates into much less overhead than MRI.

Thanks. That got me re-reading your Invokedynamic blog entry.

If you're fine with keeping up with our internal changes, and having that hash potentially change across JRuby versions, then it's fine.

Yes, I'm fine with it. But I'm hoping that even with the contents of the hash changing between versions of MRI and JRuby, I won't have to change my code.

As long as the hash as a whole can serve as a change canary.

@enebo
Copy link
Member

enebo commented Jan 10, 2017

@dgutov @headius can we get a summary on the plan? Basically we are just adding an additional counter which if it is seen to change it only means some type in the system has changed. It does not have any additional connection to the class ids or does it?

@dgutov
Copy link
Author

dgutov commented Jan 10, 2017

Basically we are just adding an additional counter which if it is seen to change it only means some type in the system has changed.

And a (EDIT:) JRuby::VM.stat method which returns a hash with this counter as the sole key-value pair (for now).

It does not have any additional connection to the class ids or does it?

I'd just expect a certain relation: if a new class id is allocated, the counter must change as well.

@enebo
Copy link
Member

enebo commented Jan 10, 2017

If this gives some value and is pseudo compatible I am ok with this. It should not be hard to add this behavior.

@headius
Copy link
Member

headius commented Jan 10, 2017

Ok, so if we're willing to go forward with this...it's design time. Do we want to just add one new counter? Or should we take advantage of the fact that we don't actually use the counter values to add a few different ones. I'm thinking method table modifications, constant table modifications, current highest-known class ID at least. If we're going to add counters wouldn't it be better to expose different changes?

@enebo
Copy link
Member

enebo commented Jan 10, 2017

@headius so long as the design supports adding new values we can start simple with obvious things but is this compatible codewise with how MRI supports it? If not should we even use the same API. I guess I am interested in somewhat working without people have to have 2 code paths but at the same time if we expose different stats perhaps we should expose them in a clean way and not try to be as similar to MRI API? I am just asking questions...

@dgutov
Copy link
Author

dgutov commented Apr 7, 2017

I guess I am interested in somewhat working without people have to have 2 code paths

I think two code paths will be fine. Certainly better than the current situation anyway.

It's basically +2 lines for my usage. And we have to run the code in JRuby on CI either way.

@dgutov
Copy link
Author

dgutov commented Sep 30, 2017

@headius EuRuKo ping!

@headius
Copy link
Member

headius commented Nov 10, 2017

Ok, with a maintenance release out of the way, perhaps we can circle back to this one...

It sounds like the best option would be to provide our own APIs to access method cache information, and to add a global method serial number that can be used to track all method cache invalidations.

It turns out that we actually did have this functionality in a different form, for a while. The MethodCacheMBean we expose for JMX management had a flushCount property that indicated whenever all method caches in the system had to be flushed (usually because someone modified Kernel, and the invalidation cascaded down). This bean was not maintained, and the related flush mechanism changed, so it disappeared.

I will look at restoring this counter, making it track individual validations, and exposing it in an appropriate way.

@headius
Copy link
Member

headius commented Nov 10, 2017

@dgutov To clarify...you are only interested in knowing when any method change happened, right? I may not have made clear above, it is possible to determine if an individual class has changed using current features.

I'd just expect a certain relation: if a new class id is allocated, the counter must change as well.

Does this align with MRI? Normally in JRuby, simply creating a new class does not invalidate any method caches, but may invalidate some constant caches when the class is assigned to a constant. Simply put: if a class is created but overrides no methods, there's nothing to invalidate. Nobody has called its methods yet, and they'll just be inherited from the parent.

If what you are interested in is global method changes, we probably wouldn't include the creation of a new class.

If what you really want is a global "SOMETHING CHANGED SOMEWHERE" serial number, that's doable too. I'd prefer to separate constant cache information from method cache information, personally.

@dgutov
Copy link
Author

dgutov commented Nov 11, 2017

you are only interested in knowing when any method change happened, right?

First and foremost, yes. IOW, I want a way to invalidate an "all methods in all classes" cache without going through ObjectSpace.each_object first.

Does this align with MRI?

Yes. It increases the value of the class_serial key. Assigning it to a constant also increases the value of the global_constant_state key.

Simply put: if a class is created but overrides no methods, there's nothing to invalidate. Nobody has called its methods yet, and they'll just be inherited from the parent.

If it defines nor overrides no methods, it's not really interesting for my intended use case. So maybe do what more convenient, implementation-wise.

I'd prefer to separate constant cache information from method cache information, personally.

IIUC, they are separate keys in MRI. It has three serials in total. We're using that hash as a single cache key in this particular use case, but having them separate adds some visibility into the internals of the VM, which is always a plus.

@headius
Copy link
Member

headius commented Nov 12, 2017

Ahh ok I think I get it. So you're using the hash itself as your indication of change in the system, and so having methods and classes and such be different counters is fine.

I think I'm on board now.

@headius
Copy link
Member

headius commented Nov 12, 2017

Ok some notes before I try to impl this...

  • Old MethodCache mbean would be worth looking at to see how it maintained a global count. I suspect it's the global "generation" counter, but that is now only used when not running with invokedynamic,
  • Generation-based invalidation is available but only for individual modules, based on the way the API is designed. It needs to be modified to accept any object that has appropriate generation-updating logic.
  • It might be worth doing some cleanup of the many ways we invalidate and centralizing things a bit better before adding this. And it would be nice to reinstate the MethodCache mbean as a general VM cache state bean.

@headius headius added this to the JRuby 9.1.15.0 milestone Nov 12, 2017
@headius
Copy link
Member

headius commented Dec 5, 2017

Looking into this today.

Another note: here's the commit when MethodCache was removed from our mbeans: e1be5e7

I'll basically be restoring it, but as "Caches" this time.

headius added a commit that referenced this issue Dec 5, 2017
This is for #4384.

For this initial version, methods and constants only have a single
value indicating validation count globally. For constants, this
will aggregate all constant names (which are invalidated
separately). For methods, this will aggregate all class hierarchy
invalidations, which will frequently mean a single invalidation
event will increment this value many times, depending on the size
of the hierarchy below it. This latter point could be improved
but we don't really have a clear place to say "this is the top
invalidation" at the moment.
@headius
Copy link
Member

headius commented Dec 5, 2017

Ok, so 0e6efdd added the Caches mbean, and it appears to be working. The screenshot below is from VisualVM and JRuby running loop { def foo; end }

image

Since we don't have a RubyVM namespace, we need to decide on a home for this. Currently, you could reach it using JRuby.runtime.caches.method_invalidation_count and constant_invalidation_count, but this kinda sorta exposes an internal API we don't necessarily want people calling directly. Thoughts, @enebo @dgutov? JRuby.stat seems too general...maybe JRuby.method_invalidation_count, possibly with JRuby::VM namespacing?

@headius
Copy link
Member

headius commented Dec 5, 2017

With the addition of the Caches mbean, this is largely done, but we need to decide on a suitable API.

headius added a commit that referenced this issue Dec 6, 2017
@headius
Copy link
Member

headius commented Dec 6, 2017

In the absence of a better suggestion, I have added JRuby::Util::cache_stats:

$ jruby -e 'p JRuby::Util::cache_stats; def foo; end; XYZ = 1; p JRuby::Util.cache_stats'
{:method_invalidation_count=>12667, :constant_invalidation_count=>1560}
{:method_invalidation_count=>13527, :constant_invalidation_count=>1561}

I believe this provides everything you need. It also tells me we should look into flattening high-level invalidations better, because we're invalidating nearly 900 classes for this one def.

Reopening a user class has the invalidation result you'd expect, since there's no descendants to invalidate:

$ jruby -e 'p JRuby::Util::cache_stats; class XYZ; def foo; end; end; p JRuby::Util.cache_stats'
{:method_invalidation_count=>12667, :constant_invalidation_count=>1560}
{:method_invalidation_count=>12668, :constant_invalidation_count=>1561}

This is still considered an internal API subject to change, but if y'all have better naming suggestions it would help convince us to support this as an official thing.

@headius headius closed this as completed Dec 6, 2017
@dgutov
Copy link
Author

dgutov commented Dec 7, 2017

Thanks, Charles!

It also tells me we should look into flattening high-level invalidations better, because we're invalidating nearly 900 classes for this one def.

I don't have better naming suggestions (this one seems fine), but hopefully a diagnostic benefit like this will convince you to keep it around anyway. ;-)

I'll try it when it's released, since there doesn't seem to be any ruby-build recipe for this branch.

BTW, there seems to be a difference between what MRI does and what you do. These numbers count every invalidation, and MRI keeps counters that invalidations are based on, I think (maybe they're performed lazily). Both should be fine for my purposes, but maybe keep this in mind if counting invalidations starts to be a performance concern.

@headius
Copy link
Member

headius commented Dec 7, 2017

Well, it's released. I guess this is what it will be named 😁

@dgutov
Copy link
Author

dgutov commented Dec 7, 2017

Looks good. Thank you!

dgutov added a commit to dgutov/robe that referenced this issue Mar 27, 2024
Or its JRuby counterpart (jruby/jruby#4384).

Truffle doesn't seem to have anything comparable, so this breaks compatibility.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants