-
-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic "once" regexps are not as atomic as in MRI #2798
Comments
See #2798 for discussion about whether the entire body of a dynamic "once" regexp should be atomic.
After some discussion with @enebo, we figured out that
And once this has run once, MRI additionally flushes all the iseqs for the regexp and just leaves the resulting object in the iseq. So, unless that changes in MRI, we need to make the same guarantee. We also discovered that IR currently will evaluate all the pieces of the dregexp every time it is encountered, which is at best a performance problem and at worst a severe semantic difference with MRI. So this needs to be fixed. Marking for 9.1.0.0. Hopefully we can get it in. |
* Ensure only one regexp is ever cached for //o This is done using an AtomicReference in the compiled method for JVM6 and a field + atomic updater in the indy call site. In both cases, we may end up evaluating the operands twice, and the code that produced them may still run after caching (a bug, #2798), but we will at least guarantee to return exactly one regexp. * Add non-boxed paths to construct dregexp with up to 5 elements. * Add a ThreadContext-local Encoding[1] to use for encoding negotiation when preprocessing the dregexp elements. * If, at JIT time, a once-dregexp has already been encountered and cached in the instr, just emit that regexp directly into the bytecode. This new logic is faster than what we had before, likely because the locking I put in place for JVM6 was preventing the JVM from jitting (punted out with "COMPILE SKIPPED: invalid parsing" due to a flaw in my code). This new logic is lighter-weight and JITs fine. Given the benchmark from #3735: 9.0.5: 3.87s 9.1: 0.70s 1.7.24: 0.72s
I pushed 256e753, which removes the locking I had in the JVM6 jit and replaces it with an atomic reference. The evaluation of the dregexp/o operands may run in two threads at the same time, but that can't be fixed right now until IR wraps the entire dregexp/o in a locking mechanism as discussed above. So for now, we guarantee the dregexp/o will only ever return one value. |
Punting to 9.1.1 since this is unlikely to affect real users (since hopefully the /o regexp would not produce different results over time). |
I'm actually going to call this fixed. It's a grey area, to be sure, and the hassle to guarantee once-only evaluation of the If someone runs into this being a problem, they can file a bug and we can debate it with them and MRI. |
The test_once_multithread test in MRI's ruby/test_regexp.rb tries to guarantee that the contents of a dynamic "once" regexp (e.g.
/#{some_call}/o
) will execute exactly once. Current JRuby master does not even guarantee that it will update atomically, allowing "last thread wins" with potentially multiple updates. The latter issue I have a fix for, but I don't have a way to enforce synchronization around the entire //o body...because IR compiles it like this:My fix just puts atomic guarantees around the update of the cache.
I have filed https://bugs.ruby-lang.org/issues/11026 to clarify with ruby-core how atomic a dynamic "once" regexp should actually be.
The text was updated successfully, but these errors were encountered: