Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic "once" regexps are not as atomic as in MRI #2798

Closed
headius opened this issue Apr 2, 2015 · 5 comments
Closed

Dynamic "once" regexps are not as atomic as in MRI #2798

headius opened this issue Apr 2, 2015 · 5 comments

Comments

@headius
Copy link
Member

headius commented Apr 2, 2015

The test_once_multithread test in MRI's ruby/test_regexp.rb tries to guarantee that the contents of a dynamic "once" regexp (e.g. /#{some_call}/o) will execute exactly once. Current JRuby master does not even guarantee that it will update atomically, allowing "last thread wins" with potentially multiple updates. The latter issue I have a fix for, but I don't have a way to enforce synchronization around the entire //o body...because IR compiles it like this:

    %v_1 = copy("")
    %v_2 = call_0o(%self ;n:some_call, t:VA, cl:false)
    %v_3 = build_dregexp(%v_1, #{%v_2} ;options: RegexpOptions(kcode: NONE, kcodeDefault, once))

My fix just puts atomic guarantees around the update of the cache.

I have filed https://bugs.ruby-lang.org/issues/11026 to clarify with ruby-core how atomic a dynamic "once" regexp should actually be.

@headius headius added this to the JRuby 9.0.0.0 milestone Apr 2, 2015
headius added a commit that referenced this issue Apr 2, 2015
See #2798 for discussion about whether the entire body of a
dynamic "once" regexp should be atomic.
@enebo enebo modified the milestone: JRuby 9.0.0.0 Jul 14, 2015
@headius
Copy link
Member Author

headius commented Mar 16, 2016

After some discussion with @enebo, we figured out that //o regexp do in fact prevent multiple threads (i.e. any threads after the first one) from running the interpolated bits, using the "once" instruction in MRI. "once" operates by checking a state field in the iseq:

  • If it is null, it sets it to the current thread and proceeds to evaluated the regexp. When complete, it sets a special value to indicate the regexp has been processed and cached.
  • If it is set to another thread, that thread goes into a Thread.pass loop while waiting for the winner thread to complete.
  • If it is set to the special value, the regexp is just returned.

And once this has run once, MRI additionally flushes all the iseqs for the regexp and just leaves the resulting object in the iseq.

So, unless that changes in MRI, we need to make the same guarantee.

We also discovered that IR currently will evaluate all the pieces of the dregexp every time it is encountered, which is at best a performance problem and at worst a severe semantic difference with MRI. So this needs to be fixed.

Marking for 9.1.0.0. Hopefully we can get it in.

@headius headius added this to the JRuby 9.1.0.0 milestone Mar 16, 2016
@headius
Copy link
Member Author

headius commented Mar 16, 2016

Ping @subbuss since @enebo and he chatted about how to implement this right.

headius added a commit that referenced this issue Mar 16, 2016
* Ensure only one regexp is ever cached for //o

This is done using an AtomicReference in the compiled method for
JVM6 and a field + atomic updater in the indy call site. In both
cases, we may end up evaluating the operands twice, and the code
that produced them may still run after caching (a bug,
#2798), but we will at least guarantee to return
exactly one regexp.

* Add non-boxed paths to construct dregexp with up to 5 elements.

* Add a ThreadContext-local Encoding[1] to use for encoding
  negotiation when preprocessing the dregexp elements.

* If, at JIT time, a once-dregexp has already been encountered and
  cached in the instr, just emit that regexp directly into the
  bytecode.

This new logic is faster than what we had before, likely because
the locking I put in place for JVM6 was preventing the JVM from
jitting (punted out with "COMPILE SKIPPED: invalid parsing" due
to a flaw in my code). This new logic is lighter-weight and JITs
fine. Given the benchmark from #3735:

9.0.5:  3.87s
9.1:    0.70s
1.7.24: 0.72s
@headius
Copy link
Member Author

headius commented Mar 16, 2016

I pushed 256e753, which removes the locking I had in the JVM6 jit and replaces it with an atomic reference. The evaluation of the dregexp/o operands may run in two threads at the same time, but that can't be fixed right now until IR wraps the entire dregexp/o in a locking mechanism as discussed above. So for now, we guarantee the dregexp/o will only ever return one value.

@headius
Copy link
Member Author

headius commented Apr 20, 2016

Punting to 9.1.1 since this is unlikely to affect real users (since hopefully the /o regexp would not produce different results over time).

@headius headius modified the milestones: JRuby 9.1.1.0, JRuby 9.1.0.0 Apr 20, 2016
@headius headius modified the milestones: JRuby 9.1.1.0, JRuby 9.1.2.0 May 11, 2016
@enebo enebo modified the milestones: JRuby 9.1.2.0, JRuby 9.1.3.0 May 23, 2016
@headius headius modified the milestones: JRuby 9.2.0.0, JRuby 9.1.3.0 Aug 17, 2016
@headius
Copy link
Member Author

headius commented Aug 17, 2016

I'm actually going to call this fixed. It's a grey area, to be sure, and the hassle to guarantee once-only evaluation of the /o regexp still seems excessive to me.

If someone runs into this being a problem, they can file a bug and we can debate it with them and MRI.

@headius headius closed this as completed Aug 17, 2016
@enebo enebo modified the milestones: JRuby 9.2.0.0, JRuby 9.1.9.0 Mar 10, 2017
@enebo enebo removed this from the JRuby 9.2.0.0 milestone Mar 10, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants