Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to local variables not visible to other threads even when protected by a lock #3891

Closed
grddev opened this issue May 16, 2016 · 20 comments

Comments

@grddev
Copy link
Contributor

grddev commented May 16, 2016

In JRuby 1.7 (all versions we have tested, including 1.7.25), the following script runs forever. The distilled test-case decrements a variable from two threads, while the main thread waits for all decrements have been applied. The decrements are then checked to ensure that each decrement call has seen a unique value of the counter.

The example is obviously nonsensical, but distilled from a much larger program where the counter was decremented in multiple threads, and then when the count reached zero, the code would continue. Tracking the intermediate values was added in order to facilitate debugging.

loop do
  print '.'

  remaining = 200
  queue = Queue.new
  lock = Mutex.new

  callbacks = Array.new(remaining) do |i|
    lambda do
      local_remaining = nil
      lock.lock
      begin
        remaining -= 1
        local_remaining = remaining 
      ensure
        lock.unlock
      end
      queue.push(local_remaining)
    end
  end

  Thread.start { callbacks[0, 100].each(&:call) }
  Thread.start { callbacks[100, 100].each(&:call) }

  counts = callbacks.map { queue.pop }

  if counts.uniq != counts
    puts "Duplicate values registered #{counts}"
    exit 1
  end
end

In JRuby 9k (tried 9.0.5.0 and 9.1.0.0-SNAPSHOT (2.3.0) 2016-05-03 9ee8643) the script terminates quickly with duplicates in the returned counts.

.Duplicate values registered [199, 199, 198, 198, 197, 196, 195, 194, 193, 192, 191, 190, 189, 188, 187, 186, 186, 185, 185, 184, 184, 183, 183, 182, 182, 181, 180, 180, 179, 179, 178, 177, 176, 175, 174, 173, 172, 171, 170, 169, 168, 167, 166, 165, 164, 163, 162, 161, 160, 159, 158, 157, 156, 155, 154, 153, 152, 151, 150, 149, 148, 147, 146, 145, 144, 143, 142, 141, 140, 139, 138, 138, 137, 136, 135, 134, 133, 132, 132, 131, 131, 130, 130, 129, 129, 128, 127, 126, 125, 124, 123, 122, 121, 120, 119, 118, 118, 117, 117, 116, 116, 115, 114, 113, 113, 112, 111, 110, 109, 108, 107, 106, 105, 104, 103, 102, 101, 101, 100, 100, 99, 99, 98, 98, 97, 97, 96, 95, 95, 94, 94, 93, 92, 91, 90, 90, 89, 89, 88, 88, 87, 87, 86, 86, 85, 84, 83, 82, 81, 80, 79, 78, 77, 76, 75, 74, 74, 73, 73, 72, 71, 70, 69, 68, 67, 67, 66, 66, 65, 65, 64, 64, 63, 62, 61, 60, 59, 59, 58, 57, 56, 55, 54, 54, 53, 52, 51, 51, 50, 50, 49, 49, 48, 48, 47, 46, 45, 44, 43, 42]

The output contains a . for each attempt, and contains a list of each observed value of the counter. As you can see in the output above, the values 199, 186, etc. are repeated (an no value below 42 is present).

Somehow, the updates to the value of remaining is not visible to other threads as expected. Notice that remaining is only read (and copied into local_remaining) inside the critical section.

Changing the script slightly removes the problem. For example:

  • Initializing remaining to [200] and changing all references to read and write the first element of the array.
  • Replacing lock.lock and lock.unlock with lock.synchronize.
  • Wrapping remaining -= 1 in proc { ... }.call.
  • Changing remaining to an instance variable @remaining.

To eliminate any issues with Queue and Mutex we also reproduced the same issues with java.util.concurrent.locks.ReentrantLock and java.util.concurrent.ArrayBlockingQueue.

Our conclusion is that updates to local variables are not guaranteed to be visible to other threads, even if updates are protected with critical sections.

@iconara
Copy link
Contributor

iconara commented May 16, 2016

After thinking about this for a few more minutes we realized that there's an even simpler way to illustrate the problem:

lock = Mutex.new
counter = 0

threads = Array.new(2) do
  Thread.start do
    10_000.times do
      lock.lock
      begin
        counter += 1
      ensure
        lock.unlock
      end
    end
  end
end

threads.map(&:join)

puts("#{JRUBY_VERSION}: #{counter}")
$ rvm jruby-1.7.25,jruby-9.0.5.0 do ruby jr9k.rb
1.7.25: 20000
9.0.5.0: 12835

Different numbers are printed by JRuby 9K for every run.

@headius
Copy link
Member

headius commented May 16, 2016

You are correct, and this is likely due to an optimization we do in 9k that we'll have to back off.

The optimization I'm speaking of is that we will only publish closed-over variables near calls, to avoid the cost of going to/from variables stored in an object on the heap. This works ok for a single thread but if multiple threads are there they may not see changes other threads have made.

To test this, I ran your script with -X-C, which only runs in our interpreter (which does not do this optimization normally). It runs just fine.

@headius
Copy link
Member

headius commented May 16, 2016

Ping @subbuss and @enebo. I tried naively turning off AddLocalVarLoadStoreInstructions but that just broke, probably because variables got no update anymore.

@subbuss
Copy link
Contributor

subbuss commented May 16, 2016

I am running out the door .. but, lock.lock and lock.unlock are both calls and so this should have worked as expected. I suspect something is up with the ensure. Try without the ensure and see if it works. I can investigate later in the night.

@iconara
Copy link
Contributor

iconara commented May 16, 2016

Removing begin … ensure makes no difference, unfortunately.

@headius
Copy link
Member

headius commented May 16, 2016

Here's the main part of the IR for the closure, with begin/ensure, queue access, and "local_remaining" removed:

BB [1:CL3_LBL_12:-1]
    %cl_3_15 = copy(nil)
    %cl_3_14 = copy(%cl_3_15)
    %cl_3_4 = copy(%cl_3_15)
    %t_remaining_12 = copy(%cl_3_15)
    %t_lock_13 = copy(%cl_3_15)
    %cl_3_6 = copy(%cl_3_15)
    %cl_3_7 = copy(%cl_3_15)
    %cl_3_11 = copy(%cl_3_15)
    %cl_3_8 = copy(%cl_3_15)
    %cl_3_10 = copy(%cl_3_15)
    %cl_3_14 = save_binding_viz()
    %cl_3_15 = push_block_frame()
    push_block_binding()
    update_block_state(%self)
    %t_lock_13 = binding_load(lock(1:2:local=true) ;scope: blah.rb_CLOSURE_1_CLOSURE_2_CLOSURE_3)
    %t_remaining_12 = binding_load(remaining(1:0:local=true) ;scope: blah.rb_CLOSURE_1_CLOSURE_2_CLOSURE_3)
BB [2:CL3_LBL_13:-1]
BB [3:_CLOSURE_START_0:6]
    line_num(;n: 10)
    %cl_3_3 = call_0o(%t_lock_13 ;n:lock, t:NO, cl:false)
    line_num(;n: 11)
    %cl_3_4 = get_global_var($!)
BB [4:CL3_LBL_2:11]
    line_num(;n: 12)
    %cl_3_7 = call_1f(%t_remaining_12, Fixnum:1 ;n:-, t:NO, cl:false)
    %t_lock_13 = binding_load(lock(1:2:local=true) ;scope: blah.rb_CLOSURE_1_CLOSURE_2_CLOSURE_3)
    %t_remaining_12 = copy(%cl_3_7)
    %cl_3_6 = copy(%cl_3_7)
BB [5:CL3_LBL_10:18]
    line_num(;n: 15)
    binding_store(%t_remaining_12, remaining(1:0:local=true) ;scope_name: blah.rb_CLOSURE_1_CLOSURE_2_CLOSURE_3)
    %cl_3_5 = call_0o(%t_lock_13 ;n:unlock, t:NO, cl:false)
    pop_binding()
    restore_binding_viz(%cl_3_14)
    pop_block_frame(%cl_3_15)
    return(%cl_3_6)
...

The binding load of remaining happens outside the locked section, so multiple threads will see the same initial value for remaining and duplicate results.

@subbuss
Copy link
Contributor

subbuss commented May 16, 2016

Yes, this is a real bug ... 'counter' is not reloaded inside the locked region because "blah.lock" doesn't have a closure passed into it, and the analysis assumes that the call cannot modify any scope vars .. which is true (except for evals but we handle that specially already). But, lock is special here .. anyway, needs further investigation and discussion.

@subbuss
Copy link
Contributor

subbuss commented May 16, 2016

Anyway, simplest solution for now is to turn off the AddLocalVar* pass and make sure JIT adds the loads/stores for local var access. I can take a look if that doesn't work or if something breaks.

@headius
Copy link
Member

headius commented May 16, 2016

When I removed AddLocalVar, I got a "nil cannot be coerce to X" type warning, so I imagine some value was not getting loaded properly anymore. I did not dig any deeper than that.

What do you mean "JIT adds the loads/stores for local var acess"?

@grddev
Copy link
Contributor Author

grddev commented May 17, 2016

Here is a script that I believe demonstrates the effect deterministically:

counter = 0
q1 = Queue.new
q2 = Queue.new

t1 = Thread.start do
  counter += 1
  q1 << nil
  q2.pop
  counter += 1
end

t2 = Thread.start do
  q1.pop
  counter += 1
  q2 << nil
end

t1.join
t2.join

puts("#{JRUBY_VERSION}: #{counter}")

The counter variable is incremented three times, in two different threads. The two queues are used to ensure execution of the threads are interleaved in the (un)desired order. The should produce 3 every time, but (at least seemingly) deterministically produces 2 in JRuby 9k. When passing -X-C, the code prints 3 also in JRuby 9k, as expected.

In addition to serving as a deterministic example, it also shows that Mutex#lock is far from the only method that can cause the value of a variable to change in a multi-threaded environment.

@subbuss
Copy link
Contributor

subbuss commented May 17, 2016

Mutex#lock is far from the only method that can cause the value of a variable to change in a multi-threaded environment.

Yup, we figured out y'day that this is a bigger bug. We have a couple different fix strategies that will handle this.

headius added a commit to headius/jruby that referenced this issue May 17, 2016
This was a good experiment, but we're not properly ensuring the
heap variables are being loaded live when needed, causing examples
like that in jruby#3891 to fail to propagate changes across threads.
By implementing LocalVariable load/store logic in JIT and turning
off the "Add" pass, we basically revert heap vars to always being
read/written immediately, as in JRuby 1.7.25.

It may be possible to improve the pass so that it localizes the
loads and stores better and ensures we don't miss updates we
should see, but this commit will test whether the "nuclear option"
passes all our suites.
@headius
Copy link
Member

headius commented May 17, 2016

@grddev If you'd like, you can test out the disable_add_loadstore branch on my jruby fork to see if it fixes all your issues. It appears to run your sample scripts correctly.

This may not be the eventual fix, but I pushed the PR to test and discuss.

@subbuss
Copy link
Contributor

subbuss commented May 18, 2016

The problem seems to be that the analyses think that the binding for the blocks passed to the threads haven't escaped, but it is not true since counter is present in the parent scope and given that the blocks reference a variable in the parent scope, the binding for the block's scopes have escaped (i.e. can be modified outside the current execution context).

You can see that all the threading runs properly if you introduce a boo{} in the thread bodies where boo is simply def boo; end

So, the fix might be as simple as fixing the scope.bindingHasEscaped() method.

@subbuss
Copy link
Contributor

subbuss commented May 18, 2016

Separately, once that fix is done, it is worth evaluating perf. with / without this pass since the JIT can now handle the pass being turned off.

headius added a commit to headius/jruby that referenced this issue May 20, 2016
This was a good experiment, but we're not properly ensuring the
heap variables are being loaded live when needed, causing examples
like that in jruby#3891 to fail to propagate changes across threads.
By implementing LocalVariable load/store logic in JIT and turning
off the "Add" pass, we basically revert heap vars to always being
read/written immediately, as in JRuby 1.7.25.

It may be possible to improve the pass so that it localizes the
loads and stores better and ensures we don't miss updates we
should see, but this commit will test whether the "nuclear option"
passes all our suites.
@subbuss
Copy link
Contributor

subbuss commented May 22, 2016

So, I started looking at the code to fix it and realized that my earlier hunch about 'bindingHasEscaped' was wrong because 'bindingHasEscaped' has a very specific semantics which is that the current dyn-scope has escaped.

Anyway, a little bit more code reading shows me that

// All variables not defined in the current scope have to be always loaded
// because of multi-threading scenarios where some other scope
// could update this variable concurrently.
//
// Allocate a new hash-set and modify it to get around ConcurrentModificationException on reqdLoads
Set<LocalVariable> newReqdLoads = new HashSet<LocalVariable>(reqdLoads);
for (LocalVariable v: reqdLoads) {
if (!scope.definesLocalVariable(v)) newReqdLoads.remove(v);
}
reqdLoads = newReqdLoads;
is the code to handle the exact scenario in this bug report.

I am looking for "!scope.definesLocalVariable(v)" and the use/def vars are set up in

public void setUpUseDefLocalVarMaps() {
definedLocalVars = new HashSet<>(1);
usedLocalVars = new HashSet<>(1);
for (BasicBlock bb : getCFG().getBasicBlocks()) {
for (Instr i : bb.getInstrs()) {
for (Variable v : i.getUsedVariables()) {
if (v instanceof LocalVariable) usedLocalVars.add((LocalVariable) v);
}
if (i instanceof ResultInstr) {
Variable v = ((ResultInstr) i).getResult();
if (v instanceof LocalVariable) definedLocalVars.add((LocalVariable) v);
}
}
}
for (IRClosure cl : getClosures()) {
cl.setUpUseDefLocalVarMaps();
}
}
.... however there is a semantic mismatch here. What the analysis needs is vars that are defined in an outer scope .. but IRScope is setting up vars that are updated in the current scope (independent of whether it is defined in an outer scope or not). However, in the code snippets in this bug report ... 'counter' is defined in an outer scope but also updated in the current scope. Anyway, I'll fix this semantic mismatch and this should be fixed.

@subbuss
Copy link
Contributor

subbuss commented May 22, 2016

Sorry about the wild-goose chase .. in the end, the fix was very simple .. I was trying to diagnose the problem when multi-tasking heavily instead of context switching jruby properly and looking at the code.

@subbuss
Copy link
Contributor

subbuss commented May 22, 2016

Anyway, one good thing about me sending everyone on a wild-goose chase is that we we got #3898 and related fixes which now gives us another perf. knob. So it is worth merging that PR and see how performance differs with and without the pass enabled.

@enebo enebo added this to the JRuby 9.1.3.0 milestone May 25, 2016
headius added a commit that referenced this issue May 26, 2016
Disable the AddLocalVarLoadStore pass to fix #3891.
@headius
Copy link
Member

headius commented May 26, 2016

I have merged the logic to JIT properly without ALVLS, so we can now evaluate both modes and see whether we want to turn it back on.

@headius
Copy link
Member

headius commented Jun 16, 2016

I don't think we ever evaluated whether to reinstate ALVLS, did we? @subbuss so you think ALVLS would no longer exhibit this problem? I think evaluation will mostly involve a study of how many instructions (IR and JVM) are produced with and without ALVLS on some typical pieces of code.

@subbuss
Copy link
Contributor

subbuss commented Jun 16, 2016

Ya, I don't think it will ... Yes, evaluate code size + perf. I suspect perf. changes will mostly be noise since most of the benefit comes from dynscope removal which was originally part of this ALVLS pass but which I split out of this since I could deduce that without this analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants