Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[resolv.rb] ConcurrencyError: interrupted waiting for mutex: null #4048

Open
purbon opened this issue Aug 4, 2016 · 3 comments
Open

[resolv.rb] ConcurrencyError: interrupted waiting for mutex: null #4048

purbon opened this issue Aug 4, 2016 · 3 comments

Comments

@purbon
Copy link

purbon commented Aug 4, 2016

Environment

  • jruby 1.7.23
  • Tested on OS X 10.10.5, but also happened in centos-7, but looks like is an OS independant bug.
    See [1] [2] for user reports and their specific environments.

as users reporting the same situation, there is environment explanations there.

Some environments looks to be having config management changing the /etc/resolv file regularly, checking with the users

Expected Behavior

The expectation is to run the host resolver without entering concurrency issues.

Actual Behavior

At logstash we got reported by a few users [1] [2] that in heavy load situations user got this exception:

"exception"=>#<ConcurrencyError: interruptedwaitingformutex: null>,
    "backtrace"=>["org/jruby/ext/thread/Mutex.java:94:in `lock'",
    "org/jruby/ext/thread/Mutex.java:147:in `synchronize'",
    "/opt/logstash/vendor/jruby/lib/ruby/1.9/resolv.rb:190:in `lazy_initialize'",
    "/opt/logstash/vendor/jruby/lib/ruby/1.9/resolv.rb:268:in `each_name'",
    "/opt/logstash/vendor/jruby/lib/ruby/1.9/resolv.rb:151:in `each_name'",
    "org/jruby/RubyArray.java:1613:in `each'",
    "/opt/logstash/vendor/jruby/lib/ruby/1.9/resolv.rb:150:in `each_name'",
    "/opt/logstash/vendor/jruby/lib/ruby/1.9/resolv.rb:132:in `getname'",
    "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-dns-2.1.3/lib/logstash/filters/dns.rb:244:in `getname'",
    "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-dns-2.1.3/lib/logstash/filters/dns.rb:231:in `retriable_getname'",
    "org/jruby/RubyProc.java:281:in `call'",
    "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-dns-2.1.3/lib/logstash/filters/dns.rb:216:in `retriable_request'",
    "org/jruby/ext/timeout/Timeout.java:115:in `timeout'",
    "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-dns-2.1.3/lib/logstash/filters/dns.rb:215:in `retriable_request'",
    "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-dns-2.1.3/lib/logstash/filters/dns.rb:230:in `retriable_getname'",
    "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-dns-2.1.3/lib/logstash/filters/dns.rb:178:in `reverse'",
    "org/jruby/RubyArray.java:1613:in `each'",
    "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-dns-2.1.3/lib/logstash/filters/dns.rb:156:in `reverse'",

(partially trimmed)

The actual LS environment is running several threads with instances of this code, so the resolver is shared.

It has been actually tricky to reproduce this with Logstash, however I've been able to isolate the problem independly from Logstash into jruby resolv:host class, you can see an script to reproduce at https://gist.github.com/purbon/106e80a5d85b3a6fbf1e5f10b5d0d643

Problem is actually that the Resolv::Host lazy_initialize method mutex is locked, the threads gets terminated while waiting for this, timeout usage. After analysing users environment it could be that some other process might be messing the host file regularly, this might be helping to cause this, not sure.

Hope the issue is clear and you get all need information, don't hesitate to ask for more context, debug information, or anything you might need.

Thanks a lot,

  • purbon
@headius
Copy link
Member

headius commented Aug 24, 2016

Does this fail on 1.7.25 as well? 9.1.2.0/9.1.3.0?

@headius
Copy link
Member

headius commented Aug 24, 2016

I had 1.7 handy so I tested. It still fails on 1.7 HEAD. It also fails on master.

Investigating.

@headius
Copy link
Member

headius commented Aug 24, 2016

Unfortunately the given script does't really tell me anything :-(

You have it set up to do a native interrupt while it's waiting on the mutex. That does indeed produce the same (somewhat cryptic) error, but it doesn't tell me why the mutex is being interrupted in your real-world case. We need more information on how threads are getting interrupted at a native level and why it is happening during this particular mutex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants