Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolv fails under load with SocketError: bind: name or service not known #3659

Closed
jsvd opened this issue Feb 12, 2016 · 17 comments
Closed

Resolv fails under load with SocketError: bind: name or service not known #3659

jsvd opened this issue Feb 12, 2016 · 17 comments

Comments

@jsvd
Copy link
Contributor

jsvd commented Feb 12, 2016

I setup dnsmasq on my mac and ran the following script:

require 'resolv'
Resolv::DNS.open(:nameserver => "127.0.0.1") do |dns|
  10000.times.each do |i|
    begin
      dns.getaddress("server-a.my.lan")
      print "#{i} " if (i % 2000 == 0)
    rescue => e
      puts i
      raise
    end
  end
end
% rvm use 1.9.3
Using /Users/joaoduarte/.rvm/gems/ruby-1.9.3-p551
% ruby dns.rb
0 2000 4000 6000 8000
% rvm use 2.2.1
Using /Users/joaoduarte/.rvm/gems/ruby-2.2.1
% ruby dns.rb
0 2000 4000 6000 8000
% rvm use jruby-1.7.23
Using /Users/joaoduarte/.rvm/gems/jruby-1.7.23
% ruby dns.rb
0 136
SocketError: bind: name or service not known
                bind at org/jruby/ext/socket/RubyUDPSocket.java:160
    bind_random_port at /Users/joaoduarte/.rvm/rubies/jruby-1.7.23/lib/ruby/1.9/resolv.rb:638
          initialize at /Users/joaoduarte/.rvm/rubies/jruby-1.7.23/lib/ruby/1.9/resolv.rb:777
  make_udp_requester at /Users/joaoduarte/.rvm/rubies/jruby-1.7.23/lib/ruby/1.9/resolv.rb:543
       each_resource at /Users/joaoduarte/.rvm/rubies/jruby-1.7.23/lib/ruby/1.9/resolv.rb:500
        each_address at /Users/joaoduarte/.rvm/rubies/jruby-1.7.23/lib/ruby/1.9/resolv.rb:396
          getaddress at /Users/joaoduarte/.rvm/rubies/jruby-1.7.23/lib/ruby/1.9/resolv.rb:372
              (root) at dns.rb:5
               times at org/jruby/RubyFixnum.java:280
                each at org/jruby/RubyEnumerator.java:274
              (root) at dns.rb:3
                open at /Users/joaoduarte/.rvm/rubies/jruby-1.7.23/lib/ruby/1.9/resolv.rb:307
              (root) at dns.rb:2
@jsvd
Copy link
Contributor Author

jsvd commented Feb 12, 2016

for completeness sake:

% rvm use jruby-9.0.5.0
Using /Users/joaoduarte/.rvm/gems/jruby-9.0.5.0
% ruby dns.rb
0 109
SocketError: bind: name or service not known
                bind at org/jruby/ext/socket/RubyUDPSocket.java:167
    bind_random_port at /Users/joaoduarte/.rvm/rubies/jruby-9.0.5.0/lib/ruby/stdlib/resolv.rb:658
          initialize at /Users/joaoduarte/.rvm/rubies/jruby-9.0.5.0/lib/ruby/stdlib/resolv.rb:809
  make_udp_requester at /Users/joaoduarte/.rvm/rubies/jruby-9.0.5.0/lib/ruby/stdlib/resolv.rb:563
      fetch_resource at /Users/joaoduarte/.rvm/rubies/jruby-9.0.5.0/lib/ruby/stdlib/resolv.rb:520
       each_resource at /Users/joaoduarte/.rvm/rubies/jruby-9.0.5.0/lib/ruby/stdlib/resolv.rb:513
        each_address at /Users/joaoduarte/.rvm/rubies/jruby-9.0.5.0/lib/ruby/stdlib/resolv.rb:410
          getaddress at /Users/joaoduarte/.rvm/rubies/jruby-9.0.5.0/lib/ruby/stdlib/resolv.rb:386
     block in dns.rb at dns.rb:5
               times at org/jruby/RubyFixnum.java:302
                each at org/jruby/RubyEnumerator.java:293
     block in dns.rb at dns.rb:3
                open at /Users/joaoduarte/.rvm/rubies/jruby-9.0.5.0/lib/ruby/stdlib/resolv.rb:306
               <top> at dns.rb:2

@jsvd
Copy link
Contributor Author

jsvd commented Feb 12, 2016

After a big more investigation, I see the problem: in resolv when a bind is attempted at a random port, the rescue block expects an exception from the Errno family:

    def self.bind_random_port(udpsock, bind_host="0.0.0.0") # :nodoc:
      begin
        port = rangerand(1024..65535)
        udpsock.bind(bind_host, port)
      rescue Errno::EADDRINUSE, # POSIX
             Errno::EACCES, # SunOS: See PRIV_SYS_NFS in privileges(5)
             Errno::EPERM # FreeBSD: security.mac.portacl.port_high is configurable.  See mac_portacl(4).
        retry
      end
    end

So I created a small script to test which exception is thrown if the port is blocked:

$ cat udp.rb
require 'socket'
u1 = UDPSocket.new
u1.bind("127.0.0.1", 53333)
u2 = UDPSocket.new
begin
u2.bind("127.0.0.1", 53333)
rescue => e
  puts e.class
  puts e.message
  puts e.class.ancestors.inspect
end

So ruby mri 2.2.1 behaves as expected:

$ rvm use 2.2.1
Using /Users/joaoduarte/.rvm/gems/ruby-2.2.1
$ ruby udp.rb
Errno::EADDRINUSE
Address already in use - bind(2) for "127.0.0.1" port 53333
[Errno::EADDRINUSE, SystemCallError, StandardError, Exception, Object, Kernel, BasicObject]

But JRuby throws a SocketError instead:

$ rvm use jruby-1.7.23
Using /Users/joaoduarte/.rvm/gems/jruby-1.7.23
$ ruby udp.rb
SocketError
bind: name or service not known
[SocketError, StandardError, Exception, Object, Kernel, BasicObject]

@enebo
Copy link
Member

enebo commented Feb 12, 2016

@jsvd I do not have time to look at this today but I will add some extra info. I can see that this behavior of expecting EADDRINUSE applies all the way back to 1.8.7. So this has been broken a long time. fwiw, we get back from pretty generic error messages from Java's net layer. It looks like we should be examining the string (sad but likely true) to figure out if we should be raising EADDRINUSE.

@jsvd
Copy link
Contributor Author

jsvd commented Feb 12, 2016

Thanks for the feedback @enebo

@headius
Copy link
Member

headius commented Feb 13, 2016

Are you able to reproduce this on 9k?

I'm having trouble getting your script to fail the same way. I get ResolvError generally if there's a DNS server at the target address. I'm on OS X.

@headius
Copy link
Member

headius commented Feb 13, 2016

Even though I can't reproduce it, if you can give me the SocketError trace while passing -Xbacktrace.style=full I should be able to track down where it is coming from and fix it.

@jsvd
Copy link
Contributor Author

jsvd commented Feb 13, 2016

The problem comes from what @enebo said, a binding on a used port raises a SocketError instead of EADDRINUSE.

Now, the reason why resolv fails in the first place is that a getresource calls on DNS.bind_random_port (http://www.rubydoc.info/stdlib/resolv/Resolv%2FDNS.bind_random_port) for each request. The method chooses a random port from the non privileged range. If there's a udp socket open from another application, it's a matter of time until the random hits that used port.

Since it's raising "the wrong exception", it bubbles up instead of being retried (normal behaviour).

@headius
Copy link
Member

headius commented Feb 14, 2016

@jsvd Confirmed! I believe this will fix by the ruby-2.3+socket branch (still in progress) which aligns our socket implementations much more closely to CRuby. That should get merged in any day now and be in 9.1.

@headius
Copy link
Member

headius commented Apr 20, 2016

The socket branch will not make it into 9.1. Bumping.

@headius headius closed this as completed Apr 20, 2016
@headius headius reopened this Apr 20, 2016
@nbarrientos
Copy link
Contributor

It's likely that we're hitting this issue too on jRuby 1.7.20.1 which is the jRuby version embedded in puppet-server 1.1.3 :( Is there way to work around it? Quite a few things rely on Resolv over here :/

@headius
Copy link
Member

headius commented May 2, 2016

@nbarrientos It's unlikely we'll be putting a lot of effort into the Socket subsystem on JRuby 1.7, so your best bet would be to try JRuby 9.1 when it comes out. We may not have it fixed, but we'll be closer, and we'll work to get it fixed for a 9.1 update.

@headius
Copy link
Member

headius commented May 2, 2016

I have a fix for this we could include in 9.1: https://gist.github.com/5b101a92d8c2a3d4cb414140f882ebc5

It's up to @enebo if it's too risky the day of the release :-)

headius added a commit that referenced this issue May 2, 2016
@headius
Copy link
Member

headius commented May 2, 2016

I've incorporated a localized fix for this issue into 9.1 and we can call this fixed. There's another bug outstanding for the socket rework that still needs to be done.

@headius headius modified the milestones: JRuby 9.1.0.0, JRuby 9.1.1.0 May 2, 2016
@headius headius closed this as completed May 2, 2016
headius added a commit that referenced this issue May 2, 2016
@headius
Copy link
Member

headius commented May 2, 2016

@nbarrientos I have made the same fix for 1.7, so it will be in 1.7.26 whenever we release that. In the short term your only workaround would be to modify resolv.rb so it also rescues SocketError around bind.

@nbarrientos
Copy link
Contributor

Thanks.

@perlun
Copy link
Contributor

perlun commented Aug 15, 2016

@headius - any chance we could get a 1.7.26 to get this fix incorporated? It would be Very Nice indeed. 😇

@perlun
Copy link
Contributor

perlun commented Sep 8, 2016

@headius - just to make things extremely clear, was this included in 1.7.26? I think I failed to find it when I skimmed through the release notes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants