Dynamic "once" regexps are not as atomic as in MRI #2798

headius · 2015-04-02T22:08:44Z

The test_once_multithread test in MRI's ruby/test_regexp.rb tries to guarantee that the contents of a dynamic "once" regexp (e.g. /#{some_call}/o) will execute exactly once. Current JRuby master does not even guarantee that it will update atomically, allowing "last thread wins" with potentially multiple updates. The latter issue I have a fix for, but I don't have a way to enforce synchronization around the entire //o body...because IR compiles it like this:

    %v_1 = copy("")
    %v_2 = call_0o(%self ;n:some_call, t:VA, cl:false)
    %v_3 = build_dregexp(%v_1, #{%v_2} ;options: RegexpOptions(kcode: NONE, kcodeDefault, once))

My fix just puts atomic guarantees around the update of the cache.

I have filed https://bugs.ruby-lang.org/issues/11026 to clarify with ruby-core how atomic a dynamic "once" regexp should actually be.

The text was updated successfully, but these errors were encountered:

See #2798 for discussion about whether the entire body of a dynamic "once" regexp should be atomic.

headius · 2016-03-16T03:02:48Z

After some discussion with @enebo, we figured out that //o regexp do in fact prevent multiple threads (i.e. any threads after the first one) from running the interpolated bits, using the "once" instruction in MRI. "once" operates by checking a state field in the iseq:

If it is null, it sets it to the current thread and proceeds to evaluated the regexp. When complete, it sets a special value to indicate the regexp has been processed and cached.
If it is set to another thread, that thread goes into a Thread.pass loop while waiting for the winner thread to complete.
If it is set to the special value, the regexp is just returned.

And once this has run once, MRI additionally flushes all the iseqs for the regexp and just leaves the resulting object in the iseq.

So, unless that changes in MRI, we need to make the same guarantee.

We also discovered that IR currently will evaluate all the pieces of the dregexp every time it is encountered, which is at best a performance problem and at worst a severe semantic difference with MRI. So this needs to be fixed.

Marking for 9.1.0.0. Hopefully we can get it in.

headius · 2016-03-16T03:03:10Z

Ping @subbuss since @enebo and he chatted about how to implement this right.

* Ensure only one regexp is ever cached for //o This is done using an AtomicReference in the compiled method for JVM6 and a field + atomic updater in the indy call site. In both cases, we may end up evaluating the operands twice, and the code that produced them may still run after caching (a bug, #2798), but we will at least guarantee to return exactly one regexp. * Add non-boxed paths to construct dregexp with up to 5 elements. * Add a ThreadContext-local Encoding[1] to use for encoding negotiation when preprocessing the dregexp elements. * If, at JIT time, a once-dregexp has already been encountered and cached in the instr, just emit that regexp directly into the bytecode. This new logic is faster than what we had before, likely because the locking I put in place for JVM6 was preventing the JVM from jitting (punted out with "COMPILE SKIPPED: invalid parsing" due to a flaw in my code). This new logic is lighter-weight and JITs fine. Given the benchmark from #3735: 9.0.5: 3.87s 9.1: 0.70s 1.7.24: 0.72s

headius · 2016-03-16T06:51:49Z

I pushed 256e753, which removes the locking I had in the JVM6 jit and replaces it with an atomic reference. The evaluation of the dregexp/o operands may run in two threads at the same time, but that can't be fixed right now until IR wraps the entire dregexp/o in a locking mechanism as discussed above. So for now, we guarantee the dregexp/o will only ever return one value.

headius · 2016-04-20T19:16:37Z

Punting to 9.1.1 since this is unlikely to affect real users (since hopefully the /o regexp would not produce different results over time).

headius · 2016-08-17T20:59:40Z

I'm actually going to call this fixed. It's a grey area, to be sure, and the hassle to guarantee once-only evaluation of the /o regexp still seems excessive to me.

If someone runs into this being a problem, they can file a bug and we can debate it with them and MRI.

headius added core JRuby 9000 labels Apr 2, 2015

headius added this to the JRuby 9.0.0.0 milestone Apr 2, 2015

headius added a commit that referenced this issue Apr 2, 2015

Make dynamic "once" regexp update atomic.

7a750e2

See #2798 for discussion about whether the entire body of a dynamic "once" regexp should be atomic.

enebo modified the milestone: JRuby 9.0.0.0 Jul 14, 2015

headius mentioned this issue Mar 15, 2016

JRuby 9.x slower than 1.7.21 when running Brakeman #3735

Closed

headius added this to the JRuby 9.1.0.0 milestone Mar 16, 2016

headius modified the milestones: JRuby 9.1.1.0, JRuby 9.1.0.0 Apr 20, 2016

headius modified the milestones: JRuby 9.1.1.0, JRuby 9.1.2.0 May 11, 2016

enebo modified the milestones: JRuby 9.1.2.0, JRuby 9.1.3.0 May 23, 2016

headius modified the milestones: JRuby 9.2.0.0, JRuby 9.1.3.0 Aug 17, 2016

headius closed this as completed Aug 17, 2016

enebo modified the milestones: JRuby 9.2.0.0, JRuby 9.1.9.0 Mar 10, 2017

enebo removed this from the JRuby 9.2.0.0 milestone Mar 10, 2017

headius added a commit that referenced this issue Sep 17, 2019

Make clear this is a failure we do not intend to fix. GH-2798.

381fc7f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Sponsors

Dynamic "once" regexps are not as atomic as in MRI #2798

Dynamic "once" regexps are not as atomic as in MRI #2798

headius commented Apr 2, 2015

headius commented Mar 16, 2016

headius commented Mar 16, 2016

headius commented Mar 16, 2016

headius commented Apr 20, 2016

headius commented Aug 17, 2016

Dynamic "once" regexps are not as atomic as in MRI #2798

Dynamic "once" regexps are not as atomic as in MRI #2798

Comments

headius commented Apr 2, 2015

headius commented Mar 16, 2016

headius commented Mar 16, 2016

headius commented Mar 16, 2016

headius commented Apr 20, 2016

headius commented Aug 17, 2016