-
-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File descriptor leak of type 'PIPE' when 'net/http' used to do a GET #2326
Comments
So the likely candidate for the pipe+eventpoll descriptor is the selection logic done for most IO operations. "Select" at the JVM level uses whatever's appropriate for the platform, e.g. kqueue or epoll, and then it also injects a pipe into the selected descriptors as a wake-up mechanism. That's these two descriptors. If they are lingering it means some IO operation is opening a new selector and not closing it. That may be a leak...but we do also cache these selectors to avoid the overhead of closing and reopening them for every read or write. I suspect you are seeing the cached selectors. If that's not the case, then you should see the number of file descriptors explode as you perform more IO operations (e.g. your script in a loop). The number of cached selector FD pairs should be roughly the same as the number of threads you have performing IO operations. Confirm? |
I believe you are correct, that these are the cached descriptors. In the case of a single threaded application, I see two sets cached. But, if I use multiple threads, I see each thread appears to get its own cache and then those linger. So if I have a script that uses three threads that each do a GET, I now have 6 descriptors (3 sets, 1 per thread) lingering, even though my threads have died. So, for example this: require 'net/http'
require 'uri'
require 'thread'
require 'jruby'
def get_data
uri = URI.parse("http://google.com/")
response = Net::HTTP.get_response(uri)
end
threads = []
3.times {
threads << Thread.new {
get_data
}
}
threads.each { |t|
t.join
}
JRuby.gc
p "Done check your lsof!"
sleep 1000 Results in this:
And if I do 4 threads, then I get 8 and so on. This would appear to be a leak since they are no longer associated to a live thread and a forced GC, or even GCs over time (as I have observed on production systems) do not seem to clean them up. |
Is this related? |
The cached selectors are not per-thread...they are just a simple cache. Threads terminating would not eliminate unused selectors, nor should we have a selector per thread (which could lead to thousands of live-but-idle selectors). You're just seeing the effect of those N threads needing M concurrent selectors. Now, perhaps they should be cleaned up, but it's not a leak in the sense of unbounded growth; the number of cached selectors will only grow to the level of concurrent select operations happening. Cleaning them up would require modifying our selector cache to have some sort of weaker reference, probably a soft reference, so that periodically old selectors would be GCed and closed. |
@johntdyer Yes, it's the cache of which I speak. We have never heard about any problems caused by hard-caching the selectors, so we never took the next step to make them clean up after a certain amount of time. |
Our use case is a bit different for JRuby/JVM. In that we keep a ScriptContainer warm and allow additional scripts to be eval'd within it. If they use threads that do I/O, then of course those linger between evals and create a file descriptor leak. Although this would also occur for any developer using short lived I/O threads in JRuby. Thread pools are always better of course, but there are certain contexts where that may not apply. |
The pooling of selectors is not done per-thread...it is just a list in memory from which all threads checkin/checkout selectors. If you send 100 threads into that container but none of them run at the same time, then you should only see a couple (or one) selectors cached. If that's not a case, then it would indeed be a bug. The number of cached, open selectors at any given time should reflect the number of concurrent IO operations that have needed selectors at the same time. I'm exploring what would be required to make these selectors lazily tidy up, but I am still not sure I've heard evidence that this is really a "leak". |
Indeed, if you launch the threads sequentially it does only produce one set of cached descriptors: require 'net/http'
require 'uri'
require 'thread'
require 'jruby'
def get_data
uri = URI.parse("http://google.com/")
response = Net::HTTP.get_response(uri)
end
3.times {
thread = Thread.new {
get_data
}
thread.join
}
JRuby.gc
p "Done check your lsof!"
sleep 1000 lsof output:
But it is reasonable that if a developer is spinning up a thread for each I/O, that they may happen concurrently over time and therefore cache additional descriptors over time. We are seeing what we believe is a 'leak' over time because of this. Of course, we may begin to push developer to use a proper queue/pool, but can't guarantee that. |
Yes, I agree we could be doing more to clean up selectors that are no longer in use, but implementing that is rather complicated. The pool is structured like this:
In order to make it lazily clean up selectors:
This is not a trivial datastructure. I'm playing with some options now. |
I think perhaps our best option is actually to provide a way to limit the size of that pool. The main issue with cleaning it up is that we want to avoid having some background thread harvesting old selectors, which means leaning on JVM finalization. Finalization does not run predictable; obviously it will run in response to explicit GCs, but otherwise you might see selectors hang on a very long time. In addition to using finalization, we would need to softly reference the selectors so that they could dereference and feed into finalization. Add to that the fact that our cache is two-level -- per NIO provider -- and it starts to cross a line for me. This will probably turn off @enebo, but the simpler option here would be to have a flag limiting how many selectors are pooled. I know, it's not a great option, since it will require individual tuning if you actually want to use it. Another alternative for you would be to clear the SelectorPool (ruby.getSelectorPool() I believe) periodically. That will go through and close all pooled selectors for all providers. Thoughts? |
The ruby.getSelectorPool() could be a good solution. The question I have on that one is that within a script is there a way to get the 'self' (aka - org.Ruby) object? Or is that possible only within Java that is managing the Ruby object? |
Ok, so doing this from Java on the Ruby object seems to do the cleanup, which appears to be a workable solution. I do this from within Java: package com.tropo.jruby;
import java.io.File;
import java.io.IOException;
import org.apache.commons.io.FileUtils;
import org.jruby.Ruby;
import org.jruby.util.io.SelectorPool;
public class Playground {
public static void main(String[] args) throws IOException, InterruptedException {
Ruby ruby = Ruby.newInstance();
SelectorPool selectorPool = ruby.getSelectorPool();
File script = new File("/Users/jsgoecke/Desktop/loop.rb");
ruby.evalScriptlet(FileUtils.readFileToString(script));
System.out.println("Done with script");
Thread.sleep(10000);
selectorPool.cleanup();
System.out.println("Clean now");
Thread.sleep(30000);
}
} This is the Ruby script it evals: require 'net/http'
require 'uri'
require 'thread'
def get_data
uri = URI.parse("http://google.com/")
response = Net::HTTP.get_response(uri)
end
threads = []
5.times {
threads << Thread.new {
get_data
}
}
threads.each { |t|
t.join
}
p "Done check your lsof!"
sleep 10 Before I do: selectorPool.cleanup(); I see these selector pools:
Then after, it shows they are gone:
I would still be curious if you may reference the Ruby object from within the script, but I am guessing this would break the security model. |
Within a script, the current JRuby runtime (org.jruby.Ruby) is available as JRuby.runtime. Normal Java integration niceties apply. If you're satisfied with that solution, we'll close this as Won't Fix for now. |
This works a charm: require 'net/http'
require 'uri'
require 'thread'
require 'jruby'
def get_data
uri = URI.parse("http://google.com/")
response = Net::HTTP.get_response(uri)
end
threads = []
4.times {
threads << Thread.new {
get_data
}
}
threads.each { |t|
t.join
}
JRuby.runtime.getSelectorPool.cleanup
p "Done check your lsof!"
sleep 1000 Thank you for all of the attention and help on this one, much appreciated! |
It appears that under OSX when I open a socket and it then closes (either through a :read_timeout, or sccuessful read) it is leaking a PIPE file descriptor. When done on Linux it does both 'PIPE' and 'eventpoll'.
I am doing something like this in JRuby-1.7.17:
And then I get this before running the script:
And then after running the script:
I did find this issue:
#221
And tried running the Java GC manually with:
But the FDs seem to hang around.
The text was updated successfully, but these errors were encountered: