-
-
Notifications
You must be signed in to change notification settings - Fork 924
Add support for #load_iseq-like callback #5206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
There might be a way to hack something like this in today, but it would be tricky. In any case, I think it's a great idea. I was just talking with @apotonick about making it possible to hook into code loading/parsing/compiling so the AST or IR could be tweaked along the way. The two ideas seem very similar. |
Great! I'd be interested in the hack for how to do it today. This would allow us to support older versions, and once this new feature you are talking about is released, we can detect which one to use. |
Hey I'm back home and back to proper work again. So After poking around a bit, I'm afraid I can see no way of doing this with current JRuby. Once the file to load has been found, it is immediately passed off to the parser and compiler, and there's no downcalls into Ruby during that process. However, this should be fairly simple to support. Instead of iseq, we would have IR. Instead of a method you overwrite (wow, super gross) I would prefer to have a way to register file handlers. They'd return either false or the compiled IR for the script. If false, the next in the chain would be invoked, and so on until no handlers are left and the default parse+compile executes. @enebo Thoughts on this? We already have mechanisms in place for saving and loading IR to a binary. All we really need to do is open up the loading and saving mechanisms to Ruby and add this callback before we load a file (if callback has been added; no-op and fall back to normal handling otherwise). We could also retool the persistence mechanism to operate from Ruby using this callback. It would be the same persistence logic, but it would run in Ruby and be observable from Ruby, so loading and IR persistence could be hooked into and observed by tools and tweaked by customizers. We might also consider doing this as a Java callback, which could obviously still be used from Ruby. This would make it more standard for us to take a filename and load it in a number of ways, like using a source cache or AST cache instead of an IR cache, precompiling it all the way to JVM bytecode immediately, searching in classpath for an equivalent precompiled script, etc. All of these would just be hooks into the loading mechanism, and more standard than the messy hodge-podge of steps we have right now. |
@headius @MaxLap what is the most basic description of what is needed? Source which is loaded and location + perhaps actual source in case of eval? I really don't see any value to returning the IR itself. You don't actually use iseq YARV instr data itself do you? I think if you want source the solution is to go way earlier than IR/ISeq and do what @headius suggests and make an event/callback on when parser is invoked. parser of non-method/blocks will not be exact moment it is executed so that maybe is an issue but evals and all other execution scopes will be at same point it executes. Another point would be when AST is converted to our IR (second solution). A callback there would still generate all non-block/methods when that code is about to execute and we can return filename/linenumber (although source internally is no longer available). For methods we are lazy so when IR is built for them it means we plan on executing at that point. Blocks are weirder in we generate IR at same time the non-block scopes are made into IR so you need to know whether you are in a method or not. You will not know when the block actually executes though (halting problem). |
The most basic description of what i need: When anyone executes ruby code from a file, so So no matter how you want to do it, as long as I can get the original code, modify it, and then have JRuby execute it as if it came fron the originaly requested file (same stack trace), I'm happy. It can be that i must compile somehow to an intermediate representation first like All i want it to modify ruby code that is backed by a file before it gets executed,so that i can then generate coverage report. I use Thanks |
@MaxLap thanks for that description. I think based on this we want a callback from our loadservice which gives you the source before our parser starts parsing. We probably take the result of that callback as the actual source to parse. We could support giving back AST or IR but those we like to be able to change over time so supporting them as a public API is not desirable. This callback we just need to define. Perhaps JRuby::Parser::load_filter { |filename, line_number, src| return mod_src }? Thinking about generic usage of this adding line_number probably would allow someone to not change source but properly use this to record parsing events. The return asymmetry bugs me (f(file, line, src) -> src) a little bit though. @headius what do you think? Now that I wrote that out I am wondering if there are any other novel uses of having a method like this. Although this issue does not seem concerned with evals I could see this as a useful way of capturing all eval'd source so people could view it somehow later. The only possible problem I can think of is maybe during kernel loading some aspect of this won't work (since the system is not fully bootstrapped yet) so we may only want this on after that point? Oh jetlag you make me ramble. |
I forgot to mention in my description that i need the file path too. Is the line number useful? Maybe if you decide to support this for If you do decide to support For eval, it seems the meaning of file_path (and line number) would be different. In the require case, those are the file being executed and in the eval case, they are the location of the callsite? (can't be anything else as the code could be a string from another file) To me, those 2 points mean that if you ever decide to have a similar callback for eval, it should be a different one. And I think this means the callback for require most likely doesn't need the line number. Unless there is something i don't see or misunderstood. |
I'm sorry I missed an opportunity to sit down with you @enebo at RubyKaigi to discuss this.
Sounds pretty good. I'd put
If you want, but best use a different hook as this wouldn't be composable, i.e. only one such filter could be specified. I'm thinking of making a small gem that would have the same interface for JRuby, MRI and TruffleRuby that would allow to compose such filters. In JRuby it would be "natural", while in MRI the caveat would be that it would use I'd like to get @eregon of TruffleRuby in the discussion too, as we'll have the same request for it :-) |
Revised idea:
load and require will accept block which is Lastly default implementation of load_filter and require_filter can be same method aliased. |
Sounds perfect for our needs. Lifecycle wise, the normal use-case is that in the test_helper / spec_helper, we get required and then we add the load_filter. I guess at that point everything, (or almost?) including this new feature, will be available? |
First a quick question:
Is there something the Returning to the feature in question... So the main use case proposed here is to be able to intercept the load of Ruby files and eval source and in some way manipulate those sources on their way to being parsed and compiled. Or after parsed or after compiled but before bound or something. The arguments passed in should be everything necessary to properly read, parse, and compile the sources. I believe the file+src for normal loads/requires and file+line+binding+src would be sufficient. The method name surrounding the eval is not provided there but I believe it can be inferred from the binding. Is more direct access needed? Then there's the actual return value, or what form the source is in. This is one area that's not clear to me. If this API is expected to be somewhat portable, I can only see a few possible options: raw source, lexer/parser events for that source as from ripper, or some third offline parsed result as from the This last one would be unusual to provide without making the related library available, and perhaps less useful for intercepting sources since no implementation directly consumes that format...we'd have to reverse the modified parsed output back into source. Then there's ripper events, but they would only be useful for manipulating the parsed result if you can basically reproduce those events in some new order or composition and feed them back into the lex/parse logic. That functionality does not exist right now. I think we end up with raw sources, so all you're really intercepting is the source after it has been resolved (in the case of load/require) and allowing it to be modified en route to the parser. Consumers of the API can then use some other parsing API ( @enebo You had some concerns about the asymmetry of (file, src) -> src, but the caller already knows the filename and unless we're talking about being able to redirect load/require lookups to another file (directly, without loading that other file manually) I don't think it's necessary to do anything else. The filter functions translate from a given file with given source (and a starting line number and binidng in the eval case) into some other source to process for that file. Seems ok? |
The short answers: In a more detailed explanation of what It has branch coverage and is compatible with MRI 2.1 and up. We also go deeper than branch coverage, detecting coverage per node from the AST. Doing this allows us to detect when things were interrupted by an Our branch coverage also considers short-circuiting operators, so || and &&, which are not part of the branch coverage introduced in MRI 2.5. While a little bit raw, we also detect default method arguments that were never used. We have ideas for more things to cover once the foundation is more robust.
If Coming back to the subject:
I don't think there was anything else I had to comment on. Your reasoning made sense. |
@headius my complaint about symmetry I am no longer bothered by it. It looked worse in the variation where I was also passing linenumber. In that case it feels like you should be able to tweak linenumber but filename I don't think you really could since loadservice already is adding an entry for it and so forth. Ok good. I think our version should start as a module JRuby::Parser and if MRI and TR get behind this then we can extend whatever module is the standard one later once we come to agreement. |
I'm having trouble with JRuby::Parser as the namespace. This is certainly related to the parser, but the hooks are explicitly fired before parsing. It's more like a load/require/eval hook. I'm thinking it's a good time to introduce JRuby::VM as a module for APIs internal to JRuby. It matches up better with RubyVM in CRuby, and it is general enough to hold these hook methods. We also still have JRuby::Util. Perhaps it's better to use that namespace rather than introduce a new one? |
Ok, work in progress but I hit a snag. What encoding should the source string be in? Because the parser will not have run yet, we do not know if there's a magic comment. We could process just that much of the file, but I wonder whether we should expect the consumers of this API to do that processing and deal only in raw bytes. If we are looking for equivalence, CRuby has already parsed the file and so it keeps all encodings of all strings in the iseq. That would indicate that we should provide a properly-encoded string to the callbacks. What then if they modify the string to have a different magic encoding comment? The hooks are not difficult to add to LoadService. I'll push a prototype that just uses raw (BINARY) bytes for now. |
Ok I've pushed an initial prototype. There's some limitations at this point.
|
Example output loading Rake:
|
Oh, I see we probably want the full path to the file also. |
Improved the path returned.
Note that many of JRuby's own resources will be presented as URLs, such as |
Re namespace, I think |
@headius I don't care too much about using Parser but originally I had thought that load/require/eval are all either posthooks of loadservice or prehooks of parsing. Picking VM just broadens the API. If we do standardize with other Impls we cannot 'extend JRuby::VM' into the standardized name. OTOH we can just duplicate this stuff into whatever name is chosen or extend from whatever is chosen back to JRuby::VM. |
@headius in retrospect does require hook vs load hook even make sense? I see you added load_hook but I guess a require internally causes a load. |
I don't understand this. How is JRuby::VM any different from JRuby::Parser in this regard? In any case I figured if this feature (source to source pre-translation) were ever approved for standard Ruby, we'd just call our method from the actual method.
As far as actually loading and parsing the code there's no difference, so maybe not? Load and require differ only in how they search for the file (load requires file extension) and whether they update LOADED_FEATURES (load does not). Neither of these have much relevance for source-to-source-translation before parsing.
As pre-hooks it does make some sense to be JRuby::Parser. I'm struggling with naming of the method though. preparse? parser_hook or hook_parser? I lean toward longer names but that's not the Ruby way; add_parser_filter or add_source_filter? JRuby::Parser.preprocess? |
JRuby::VM is only different in that I can see us adding more methods unrelated to this particular feature as it is pretty generic namewise. With that said, of course we can just call into this method with a thin wrapper method. I just found if we bounded the callbacks into a single namespace it would be more aesthetically pleasing to integrate into a standard later. No big deal though. We also want to potentially designate between eval and file(require/load). Not sure on good name either. I would probably choose a slightly longer name as this is more internal for Ruby (and we have precedence for things like abort_on_exception). |
I'd be reluctant to mix in a JRuby::Whatever module directly into an official user-facing API anyway. In any case I doubt the official method names (if they ever happen) would match ours. add_load_hook and add_eval_hook? |
Yeah sounds good |
When thinking about names for this feature, I immediately think it "transforms the Ruby source", right before it's parsed. |
@eregon Yeah I had that concern too...like a load hook would be used to redirect to a different file or change the lookup process. I'm thinking along the lines of add_source_transform. |
transform_source(src, file, is_eval) |
@enebo Ok so the remaining problem I have is all the places where we do parsing of source. The tweaks I have in my commit above feel hacky and too spread out, and I only covered the few cases that I saw during my experiment. Maybe you have some thoughts here? Maybe we need some refactoring to funnel these disparate paths through fewer places? |
Time flies! Could we have an update? Any idea when you may have the time to work on this? |
@MaxLap Hey there, we've been kinda heads-down just trying to stabilize 9.2 the past couple months. I haven't forgotten this but we have not had time to look into it. Maybe we could chat about you trying to prototype something with us? |
@MaxLap This feature somewhat relates to https://bugs.ruby-lang.org/issues/15287#note-10
|
I've personally used |
If we are rewriting Ruby code as a String to Ruby code as a String, then |
Not much has changed here. CRuby continues to publish CRuby-specific features under RubyVM, which people then use and turn into de-facto standard features. If anyone more familiar with this particular feature can provide an update, it would be very helpful to me. As it stands this will likely get punted to 9.4 or closed altogether. |
Hello, I'm one of the dev of deep-cover and I am hitting a bit of a challenge to fully support JRuby.
Little context, we instrument Ruby code in order to add trackers (setters in a global variable) to detect what has been executed and what hasn't. This allows us to do branch coverage and "node" coverage, which basically deals with the coverage at the AST node granularity.
We are wondering if a method similar to
#load_iseq
of Ruby 2.3+ could be added to JRuby.We don't need anything related to compiling the code. All we need is a method that gets called when JRuby is about to execute Ruby code, either for a
#require
, a#load
, or when triggered by an autoload. That method would receive the absolute path to the file, and return the the actual Ruby code to execute.Maybe there is even already a way of doing this with some monkey patch?
The text was updated successfully, but these errors were encountered: