Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

threaded calls to InetAddress.getByName stop working after NIC disconnects twice #4549

Closed
jsvd opened this issue Mar 28, 2017 · 9 comments
Closed

Comments

@jsvd
Copy link
Contributor

jsvd commented Mar 28, 2017

Environment

  • Red Hat Enterprise Linux Server release 7.3 (Maipo)
  • Amazon EC2 T2 Micro
  • uname -a: Linux ip-10-77-5-109.eu-west-1.compute.internal 3.10.0-514.el7.x86_64 #1 SMP Wed Oct 19 11:24:13 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
  • jruby 9.1.8.0 (2.3.1) 2017-03-06 90fc7ab OpenJDK 64-Bit Server VM 25.121-b13 on 1.8.0_121-b13 +jit [linux-x86_64]
  • Problem also occurs with jruby 1.7.25 (1.9.3p551) 2016-04-13 867cb81 on OpenJDK 64-Bit Server VM 1.8.0_121-b13 +jit [linux-amd64]

Expected Behavior

  • If a network card is disabled and re-enabled, subsequent calls to InetAddress.getByName(host) should work

Steps to reproduce:

  1. create centos vm on amazon
  2. install java 8 and jruby
  3. start screen with 2 windows
    3.1 on the first run ruby script
    3.2 on the second run nmcli device disconnect eth0 && sleep 15 && nmcli device connect eth0 to disable temporarily the NIC
  4. login back to the machine
  5. screen -r
  6. repeat 3.2

Example script:

java_import java.net.InetAddress

HOST = "ec2-34-252-184-81.eu-west-1.compute.amazonaws.com"

def lookup
  puts InetAddress.getByName(HOST)
rescue Exception => e
  puts e.inspect
end

thread = Thread.new do
  loop { lookup; sleep 3 }
end

sleep 500

Actual Behavior

  • After the second time where the nic is disconnected, the script is never able to resolve the host to an IP

Console output:

$ jruby script.rb
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/10.77.39.169
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/10.77.39.169
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/10.77.39.169
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/10.77.39.169
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/10.77.39.169
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/10.77.39.169
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/10.77.39.169
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/10.77.39.169
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/10.77.39.169
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/10.77.39.169
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/10.77.39.169
java.net.UnknownHostException: ec2-34-252-184-81.eu-west-1.compute.amazonaws.com: Name or service not known
java.net.UnknownHostException: ec2-34-252-184-81.eu-west-1.compute.amazonaws.com
java.net.UnknownHostException: ec2-34-252-184-81.eu-west-1.compute.amazonaws.com
java.net.UnknownHostException: ec2-34-252-184-81.eu-west-1.compute.amazonaws.com
java.net.UnknownHostException: ec2-34-252-184-81.eu-west-1.compute.amazonaws.com: Name or service not known
java.net.UnknownHostException: ec2-34-252-184-81.eu-west-1.compute.amazonaws.com
java.net.UnknownHostException: ec2-34-252-184-81.eu-west-1.compute.amazonaws.com
##  ... forever ...

NOTE:
Using the equivalent java code the problem doesn't occur as after a couple of exceptions the code is able to resolve the host:

import java.net.InetAddress;
import java.util.concurrent.TimeUnit;

public class HostName {
    public static void main(String[] args) throws Exception {
        Thread t = new Thread() {
            public void resolve(String host) {
              try {
                  System.out.println(InetAddress.getByName(host));
              } catch (Exception e) {
                  System.out.println(e.toString());
              }
            } 
            public void run() {
                while(true) {
                    try {
                    resolve("ec2-34-252-184-81.eu-west-1.compute.amazonaws.com");
                    TimeUnit.SECONDS.sleep(3);
                    } catch (Exception e) {}
                }
            }
        };
        t.start();
        TimeUnit.SECONDS.sleep(500);
    }
}
$ java HostName
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/10.77.39.169
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/10.77.39.169
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/10.77.39.169
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/10.77.39.169
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/10.77.39.169
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/10.77.39.169
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/10.77.39.169
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/10.77.39.169
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/10.77.39.169
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/10.77.39.169
java.net.UnknownHostException: ec2-34-252-184-81.eu-west-1.compute.amazonaws.com: Name or service not known
java.net.UnknownHostException: ec2-34-252-184-81.eu-west-1.compute.amazonaws.com
java.net.UnknownHostException: ec2-34-252-184-81.eu-west-1.compute.amazonaws.com
java.net.UnknownHostException: ec2-34-252-184-81.eu-west-1.compute.amazonaws.com
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/10.77.39.169    < -- recovered here
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/10.77.39.169
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/10.77.39.169
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/10.77.39.169
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/10.77.39.169

Note that this doesn't reproduce consistently 100% but pretty much every time. I'm happy to provide more information on this.
This comes from investigation done in consequence of the bug reported here

@jsvd jsvd changed the title threaded calls to InetAddress.getByName stop working after NIC disconnects threaded calls to InetAddress.getByName stop working after NIC disconnects twice Mar 28, 2017
@acchen97
Copy link

acchen97 commented Nov 8, 2017

@enebo any thoughts around this bug and prioritization of a relevant fix?

@enebo
Copy link
Member

enebo commented Nov 8, 2017

@acchen97 I would not think it would be possible for us to behave differently than Java unless somehow we toggle some environment in some way that causes this (but I cannot think of what that would be). Both of your scripts are calling into the same Java code but the JRuby in indirecting a little bit on the way to make the same Java call. There really should be no difference?

@headius any ideas?

@headius
Copy link
Member

headius commented Nov 9, 2017

This is truly bizarre. I can't imagine what we'd be doing differently than Java here, since we just use the JDK socket classes currently.

My only thought is that perhaps the way we're requesting the JDK resolve the URL uses some internal JDK cache that gets marked as error and stays error.

@headius
Copy link
Member

headius commented Nov 9, 2017

Oh wow, looking at the code again I realize you're actually calling the Java class from Ruby. Yeah this is seriously strange. Same URL, same API call, only difference is that we do it from Ruby.

@headius
Copy link
Member

headius commented Nov 9, 2017

Ok, so I tried this locally, turning wifi off to trigger the errors, but my run was able to recover. Took a while to start erroring, took a while to stop erroring, but it did both.

$ jruby nic.rb
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/34.252.184.81
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/34.252.184.81
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/34.252.184.81
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/34.252.184.81
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/34.252.184.81
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/34.252.184.81
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/34.252.184.81
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/34.252.184.81
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/34.252.184.81
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/34.252.184.81
java.net.UnknownHostException: ec2-34-252-184-81.eu-west-1.compute.amazonaws.com: nodename nor servname provided, or not known
java.net.UnknownHostException: ec2-34-252-184-81.eu-west-1.compute.amazonaws.com
java.net.UnknownHostException: ec2-34-252-184-81.eu-west-1.compute.amazonaws.com
java.net.UnknownHostException: ec2-34-252-184-81.eu-west-1.compute.amazonaws.com
java.net.UnknownHostException: ec2-34-252-184-81.eu-west-1.compute.amazonaws.com: nodename nor servname provided, or not known
java.net.UnknownHostException: ec2-34-252-184-81.eu-west-1.compute.amazonaws.com
java.net.UnknownHostException: ec2-34-252-184-81.eu-west-1.compute.amazonaws.com
java.net.UnknownHostException: ec2-34-252-184-81.eu-west-1.compute.amazonaws.com
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/34.252.184.81
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/34.252.184.81
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/34.252.184.81
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/34.252.184.81
^C

This is JDK 8u121 on macos 10.13.

@headius
Copy link
Member

headius commented Nov 9, 2017

This is Ubuntu Linux 16.04. There are two NIC disables (disable networking from Unity tray) and two reenables. It recovered both times.

$ jruby blah.rb
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/34.252.184.81
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/34.252.184.81
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/34.252.184.81
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/34.252.184.81
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/34.252.184.81
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/34.252.184.81
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/34.252.184.81
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/34.252.184.81
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/34.252.184.81
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/34.252.184.81
java.net.UnknownHostException: ec2-34-252-184-81.eu-west-1.compute.amazonaws.com: Temporary failure in name resolution
java.net.UnknownHostException: ec2-34-252-184-81.eu-west-1.compute.amazonaws.com
java.net.UnknownHostException: ec2-34-252-184-81.eu-west-1.compute.amazonaws.com
java.net.UnknownHostException: ec2-34-252-184-81.eu-west-1.compute.amazonaws.com
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/34.252.184.81
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/34.252.184.81
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/34.252.184.81
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/34.252.184.81
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/34.252.184.81
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/34.252.184.81
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/34.252.184.81
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/34.252.184.81
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/34.252.184.81
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/34.252.184.81
java.net.UnknownHostException: ec2-34-252-184-81.eu-west-1.compute.amazonaws.com: Temporary failure in name resolution
java.net.UnknownHostException: ec2-34-252-184-81.eu-west-1.compute.amazonaws.com
java.net.UnknownHostException: ec2-34-252-184-81.eu-west-1.compute.amazonaws.com
java.net.UnknownHostException: ec2-34-252-184-81.eu-west-1.compute.amazonaws.com
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/34.252.184.81
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/34.252.184.81
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/34.252.184.81
ec2-34-252-184-81.eu-west-1.compute.amazonaws.com/34.252.184.81

@jsvd
Copy link
Contributor Author

jsvd commented Nov 14, 2017

This is weird, I'm having trouble replicating it myself. That said I'm still finding a strange behaviour, after about 5-10 nic restarts the script will terminate silently.

ec2-52-16-41-147.eu-west-1.compute.amazonaws.com/10.77.16.245
ec2-52-16-41-147.eu-west-1.compute.amazonaws.com/10.77.16.245
java.net.UnknownHostException: ec2-52-16-41-147.eu-west-1.compute.amazonaws.com: Name or service not known
java.net.UnknownHostException: ec2-52-16-41-147.eu-west-1.compute.amazonaws.com
java.net.UnknownHostException: ec2-52-16-41-147.eu-west-1.compute.amazonaws.com
java.net.UnknownHostException: ec2-52-16-41-147.eu-west-1.compute.amazonaws.com
java.net.UnknownHostException: ec2-52-16-41-147.eu-west-1.compute.amazonaws.com: Name or service not known
java.net.UnknownHostException: ec2-52-16-41-147.eu-west-1.compute.amazonaws.com
java.net.UnknownHostException: ec2-52-16-41-147.eu-west-1.compute.amazonaws.com
java.net.UnknownHostException: ec2-52-16-41-147.eu-west-1.compute.amazonaws.com
ec2-52-16-41-147.eu-west-1.compute.amazonaws.com/10.77.16.245
ec2-52-16-41-147.eu-west-1.compute.amazonaws.com/10.77.16.245
ec2-52-16-41-147.eu-west-1.compute.amazonaws.com/10.77.16.245
ec2-52-16-41-147.eu-west-1.compute.amazonaws.com/10.77.16.245
ec2-52-16-41-147.eu-west-1.compute.amazonaws.com/10.77.16.245
ec2-52-16-41-147.eu-west-1.compute.amazonaws.com/10.77.16.245
ec2-52-16-41-147.eu-west-1.compute.amazonaws.com/10.77.16.245
ec2-52-16-41-147.eu-west-1.compute.amazonaws.com/10.77.16.245
ec2-52-16-41-147.eu-west-1.compute.amazonaws.com/10.77.16.245
ec2-52-16-41-147.eu-west-1.compute.amazonaws.com/10.77.16.245
java.net.UnknownHostException: ec2-52-16-41-147.eu-west-1.compute.amazonaws.com: Name or service not known
java.net.UnknownHostException: ec2-52-16-41-147.eu-west-1.compute.amazonaws.com
java.net.UnknownHostException: ec2-52-16-41-147.eu-west-1.compute.amazonaws.com
java.net.UnknownHostException: ec2-52-16-41-147.eu-west-1.compute.amazonaws.com
java.net.UnknownHostException: ec2-52-16-41-147.eu-west-1.compute.amazonaws.com: Name or service not known
java.net.UnknownHostException: ec2-52-16-41-147.eu-west-1.compute.amazonaws.com
java.net.UnknownHostException: ec2-52-16-41-147.eu-west-1.compute.amazonaws.com
[ec2-user@ip-10-77-16-245 ~]$

@headius
Copy link
Member

headius commented Nov 14, 2017

In any case, I am leaning toward this not being our issue. We are really just using JDK APIs to look up addresses, and if it gives up there's not a lot we could do to fix it.

@jsvd
Copy link
Contributor Author

jsvd commented Nov 14, 2017

++ thanks for taking the time to investigate. closing now, I will reopen if I find new information.

@jsvd jsvd closed this as completed Nov 14, 2017
@enebo enebo added this to the Invalid or Duplicate milestone Dec 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants