Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible ChannelFD leak in FilenoUtil? #4796

Closed
seanguo opened this issue Sep 25, 2017 · 31 comments
Closed

Possible ChannelFD leak in FilenoUtil? #4796

seanguo opened this issue Sep 25, 2017 · 31 comments

Comments

@seanguo
Copy link

seanguo commented Sep 25, 2017

Environment

JRuby 9.1.13.0
Linux
Using as embedded JSR223 engine

We upgraded from 1.7.21 to 9.1.13.0 to try to address the issue in #4446. But after a simple load test we found that the map in FilenoUtil is accumulating during the test. So I wrote the following simple script to try to reproduce:

Sample Code Snippet

public class Test {
  public static void main(String[] args) throws InterruptedException, ScriptException, ClassNotFoundException, IOException {
    ScriptEngineManager scriptEngineManager = new ScriptEngineManager();
    final ScriptEngine engine = scriptEngineManager.getEngineByName("jruby");
    final ScriptContext scriptContext = new SimpleScriptContext();
    scriptContext.setAttribute("org.jruby.embed.termination", true, ScriptContext.ENGINE_SCOPE);

    engine.eval("print 'test\n'", scriptContext);

    System.gc();
    Thread.sleep(5000);

    System.out.println(Ruby.getGlobalRuntime().getFilenoUtil().getNumberOfWrappers());

  }
}

Unexpected Behavior

I got the following output:
test
3
Shouldn't the number be 0?

Is this an issue of the new code or my usage is incorrect for the new update?
Thanks.

@seanguo
Copy link
Author

seanguo commented Sep 25, 2017

I also tried to dump the thread stack in the register method and unregister method, but it seems no unregister is invoked.

java.lang.Exception: Stack trace
	at java.lang.Thread.dumpStack(Thread.java:1365)
	at org.jruby.util.io.FilenoUtil.registerWrapper(FilenoUtil.java:82)
	at org.jruby.util.io.ChannelFD.<init>(ChannelFD.java:38)
	at org.jruby.util.io.OpenFile.setChannel(OpenFile.java:190)
	at org.jruby.RubyIO.prepIO(RubyIO.java:255)
	at org.jruby.RubyIO.prepStdio(RubyIO.java:184)
	at org.jruby.RubyGlobal.initSTDIO(RubyGlobal.java:274)
	at org.jruby.RubyGlobal.createGlobals(RubyGlobal.java:202)
	at org.jruby.Ruby.init(Ruby.java:1195)
	at org.jruby.Ruby.newInstance(Ruby.java:339)
	at org.jruby.embed.internal.AbstractLocalContextProvider.getGlobalRuntime(AbstractLocalContextProvider.java:82)
	at org.jruby.embed.internal.SingletonLocalContextProvider.getRuntime(SingletonLocalContextProvider.java:99)
	at org.jruby.embed.jsr223.Utils.setWriter(Utils.java:149)
	at org.jruby.embed.jsr223.Utils.preEval(Utils.java:113)
	at org.jruby.embed.jsr223.JRubyEngine.eval(JRubyEngine.java:88)
	at org.jruby.Test.main(Test.java:21)
java.lang.Exception: Stack trace
	at java.lang.Thread.dumpStack(Thread.java:1365)
	at org.jruby.util.io.FilenoUtil.registerWrapper(FilenoUtil.java:82)
	at org.jruby.util.io.ChannelFD.<init>(ChannelFD.java:38)
	at org.jruby.util.io.OpenFile.setChannel(OpenFile.java:190)
	at org.jruby.RubyIO.prepIO(RubyIO.java:255)
	at org.jruby.RubyIO.prepStdio(RubyIO.java:206)
	at org.jruby.RubyGlobal.initSTDIO(RubyGlobal.java:276)
	at org.jruby.RubyGlobal.createGlobals(RubyGlobal.java:202)
	at org.jruby.Ruby.init(Ruby.java:1195)
	at org.jruby.Ruby.newInstance(Ruby.java:339)
	at org.jruby.embed.internal.AbstractLocalContextProvider.getGlobalRuntime(AbstractLocalContextProvider.java:82)
	at org.jruby.embed.internal.SingletonLocalContextProvider.getRuntime(SingletonLocalContextProvider.java:99)
	at org.jruby.embed.jsr223.Utils.setWriter(Utils.java:149)
	at org.jruby.embed.jsr223.Utils.preEval(Utils.java:113)
	at org.jruby.embed.jsr223.JRubyEngine.eval(JRubyEngine.java:88)
	at org.jruby.Test.main(Test.java:21)
java.lang.Exception: Stack trace
	at java.lang.Thread.dumpStack(Thread.java:1365)
	at org.jruby.util.io.FilenoUtil.registerWrapper(FilenoUtil.java:82)
	at org.jruby.util.io.ChannelFD.<init>(ChannelFD.java:38)
	at org.jruby.util.io.OpenFile.setChannel(OpenFile.java:190)
	at org.jruby.RubyIO.prepIO(RubyIO.java:255)
	at org.jruby.RubyIO.prepStdio(RubyIO.java:206)
	at org.jruby.RubyGlobal.initSTDIO(RubyGlobal.java:278)
	at org.jruby.RubyGlobal.createGlobals(RubyGlobal.java:202)
	at org.jruby.Ruby.init(Ruby.java:1195)
	at org.jruby.Ruby.newInstance(Ruby.java:339)
	at org.jruby.embed.internal.AbstractLocalContextProvider.getGlobalRuntime(AbstractLocalContextProvider.java:82)
	at org.jruby.embed.internal.SingletonLocalContextProvider.getRuntime(SingletonLocalContextProvider.java:99)
	at org.jruby.embed.jsr223.Utils.setWriter(Utils.java:149)
	at org.jruby.embed.jsr223.Utils.preEval(Utils.java:113)
	at org.jruby.embed.jsr223.JRubyEngine.eval(JRubyEngine.java:88)
	at org.jruby.Test.main(Test.java:21)
java.lang.Exception: Stack trace
	at java.lang.Thread.dumpStack(Thread.java:1365)
	at org.jruby.util.io.FilenoUtil.registerWrapper(FilenoUtil.java:82)
	at org.jruby.util.io.ChannelFD.<init>(ChannelFD.java:39)
	at org.jruby.RubyIO.<init>(RubyIO.java:138)
	at org.jruby.embed.jsr223.Utils.getRubyIO(Utils.java:179)
	at org.jruby.embed.jsr223.Utils.setWriter(Utils.java:150)
	at org.jruby.embed.jsr223.Utils.preEval(Utils.java:113)
	at org.jruby.embed.jsr223.JRubyEngine.eval(JRubyEngine.java:88)
	at org.jruby.Test.main(Test.java:21)
java.lang.Exception: Stack trace
	at java.lang.Thread.dumpStack(Thread.java:1365)
	at org.jruby.util.io.FilenoUtil.registerWrapper(FilenoUtil.java:82)
	at org.jruby.util.io.ChannelFD.<init>(ChannelFD.java:39)
	at org.jruby.RubyIO.<init>(RubyIO.java:138)
	at org.jruby.embed.jsr223.Utils.getRubyIO(Utils.java:179)
	at org.jruby.embed.jsr223.Utils.setErrorWriter(Utils.java:171)
	at org.jruby.embed.jsr223.Utils.preEval(Utils.java:114)
	at org.jruby.embed.jsr223.JRubyEngine.eval(JRubyEngine.java:88)
	at org.jruby.Test.main(Test.java:21)
test
java.lang.Exception: Stack trace
	at java.lang.Thread.dumpStack(Thread.java:1365)
	at org.jruby.util.io.FilenoUtil.registerWrapper(FilenoUtil.java:82)
	at org.jruby.util.io.ChannelFD.<init>(ChannelFD.java:38)
	at org.jruby.util.io.OpenFile.setChannel(OpenFile.java:190)
	at org.jruby.RubyIO.prepIO(RubyIO.java:255)
	at org.jruby.RubyIO.prepStdio(RubyIO.java:184)
	at org.jruby.RubyGlobal.initSTDIO(RubyGlobal.java:274)
	at org.jruby.RubyGlobal.createGlobals(RubyGlobal.java:202)
	at org.jruby.Ruby.init(Ruby.java:1195)
	at org.jruby.Ruby.newInstance(Ruby.java:339)
	at org.jruby.embed.internal.AbstractLocalContextProvider.getGlobalRuntime(AbstractLocalContextProvider.java:82)
	at org.jruby.embed.internal.SingletonLocalContextProvider.getRuntime(SingletonLocalContextProvider.java:99)
	at org.jruby.embed.ScriptingContainer.terminate(ScriptingContainer.java:1858)
	at org.jruby.embed.jsr223.JRubyEngine.eval(JRubyEngine.java:98)
	at org.jruby.Test.main(Test.java:21)
java.lang.Exception: Stack trace
	at java.lang.Thread.dumpStack(Thread.java:1365)
	at org.jruby.util.io.FilenoUtil.registerWrapper(FilenoUtil.java:82)
	at org.jruby.util.io.ChannelFD.<init>(ChannelFD.java:38)
	at org.jruby.util.io.OpenFile.setChannel(OpenFile.java:190)
	at org.jruby.RubyIO.prepIO(RubyIO.java:255)
	at org.jruby.RubyIO.prepStdio(RubyIO.java:206)
	at org.jruby.RubyGlobal.initSTDIO(RubyGlobal.java:276)
	at org.jruby.RubyGlobal.createGlobals(RubyGlobal.java:202)
	at org.jruby.Ruby.init(Ruby.java:1195)
	at org.jruby.Ruby.newInstance(Ruby.java:339)
	at org.jruby.embed.internal.AbstractLocalContextProvider.getGlobalRuntime(AbstractLocalContextProvider.java:82)
	at org.jruby.embed.internal.SingletonLocalContextProvider.getRuntime(SingletonLocalContextProvider.java:99)
	at org.jruby.embed.ScriptingContainer.terminate(ScriptingContainer.java:1858)
	at org.jruby.embed.jsr223.JRubyEngine.eval(JRubyEngine.java:98)
	at org.jruby.Test.main(Test.java:21)
java.lang.Exception: Stack trace
	at java.lang.Thread.dumpStack(Thread.java:1365)
	at org.jruby.util.io.FilenoUtil.registerWrapper(FilenoUtil.java:82)
	at org.jruby.util.io.ChannelFD.<init>(ChannelFD.java:38)
	at org.jruby.util.io.OpenFile.setChannel(OpenFile.java:190)
	at org.jruby.RubyIO.prepIO(RubyIO.java:255)
	at org.jruby.RubyIO.prepStdio(RubyIO.java:206)
	at org.jruby.RubyGlobal.initSTDIO(RubyGlobal.java:278)
	at org.jruby.RubyGlobal.createGlobals(RubyGlobal.java:202)
	at org.jruby.Ruby.init(Ruby.java:1195)
	at org.jruby.Ruby.newInstance(Ruby.java:339)
	at org.jruby.embed.internal.AbstractLocalContextProvider.getGlobalRuntime(AbstractLocalContextProvider.java:82)
	at org.jruby.embed.internal.SingletonLocalContextProvider.getRuntime(SingletonLocalContextProvider.java:99)
	at org.jruby.embed.ScriptingContainer.terminate(ScriptingContainer.java:1858)
	at org.jruby.embed.jsr223.JRubyEngine.eval(JRubyEngine.java:98)
	at org.jruby.Test.main(Test.java:21)
3

@snood1205
Copy link

If it helps at all, running jruby-9.1.13.0 on my mac gives me the following related warning message on all runs

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.jruby.util.io.FilenoUtil to method sun.nio.ch.SelChImpl.getFD()
WARNING: Please consider reporting this to the maintainers of org.jruby.util.io.FilenoUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release

@petrvalkoun-gooddata
Copy link

petrvalkoun-gooddata commented Oct 24, 2017

fully updated jruby 9.1.13.0 on windows 8.1 x64 with jdk-9.0.2 x64 gives me following error:

gem update
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.jruby.util.io.FilenoUtil to method sun.nio.ch.SelChImpl.getFD()
WARNING: Please consider reporting this to the maintainers of org.jruby.util.io.FilenoUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
ERROR: Loading command: update (LoadError)
Unsupported platform: x86_64-windows
ERROR: While executing gem ... (NoMethodError)
undefined method `invoke_with_build_args' for nil:NilClass

java --version
java 9.0.1
Java(TM) SE Runtime Environment (build 9.0.1+11)
Java HotSpot(TM) 64-Bit Server VM (build 9.0.1+11, mixed mode)

jruby --version
jruby 9.1.13.0 (2.3.3) 2017-09-06 8e1c115 Java HotSpot(TM) 64-Bit Server VM 9.0.1+11 on 9.0.1+11 +jit [mswin32-x86_64]

@headius
Copy link
Member

headius commented Oct 30, 2017

@seanquo There certainly could be a leak, but your test may not quite rigorous enough to prove it. The fake filenos acquired by the container would likely be attached to IO objects that need to finalize. Assuming the container and everything it references goes away, there may be a couple GC cycles needed to fully unregister those filenos.

If, however, you can show that the fake filenos continue to pile up under heavy load with multiple GC cycles, then we may have a problem. We have (or used to have) tests for fileno leaks, but I'm not sure if they're running.

@seanguo
Copy link
Author

seanguo commented Oct 31, 2017

@headius Actually we found this issue during our load test. The heap continues grow and full GC won't help on reclaiming the memory. So we made a heap dump which shows the leak points of the file descriptor. That's why I created this issue and attached the small program for you to reproduce.

@headius
Copy link
Member

headius commented Oct 31, 2017

Complete test app with imports:

import org.jruby.Ruby;

import javax.script.ScriptContext;
import javax.script.ScriptEngine;
import javax.script.ScriptEngineManager;
import javax.script.ScriptException;
import javax.script.SimpleScriptContext;
import java.io.IOException;

public class Test {
  public static void main(String[] args) throws InterruptedException, ScriptException, ClassNotFoundException, IOException {
    ScriptEngineManager scriptEngineManager = new ScriptEngineManager();
    final ScriptEngine engine = scriptEngineManager.getEngineByName("jruby");
    final ScriptContext scriptContext = new SimpleScriptContext();
    scriptContext.setAttribute("org.jruby.embed.termination", true, ScriptContext.ENGINE_SCOPE);

    engine.eval("print 'test\n'", scriptContext);

    System.gc();
    Thread.sleep(5000);

    System.out.println(Ruby.getGlobalRuntime().getFilenoUtil().getNumberOfWrappers());

  }
}

@headius
Copy link
Member

headius commented Oct 31, 2017

Ok, so there's a few problems here.

First off, the engine never goes out of scope, so I wouldn't expect it to go away. If the engine doesn't go away, the fake filenos registered by JRuby to serve that engine won't go away either.

Second, because the JSR-223 scripting logic in JRuby uses the "global" runtime, that runtime never goes out of scope and remains referenced. As a result, the IO objects that wrap those fake filenos (in this case, for in/out/err) never get collected.

I modified your script to loop 100 times over the script engine logic, creating and executing 100 engines. The count of fileno does not go beyond 3, regardless of how many engines execute.

I also made a Ruby script that generates hundreds of fake filenos. It does not appear to leak, and the filenos do get unregistered.

loop do
  100.times do
    java.io.ByteArrayOutputStream.new.to_io
  end
  JRuby.gc
  p JRuby.runtime.fileno_util.number_of_wrappers

If you would like finer-grained control over scripting JRuby from Java, including the ability to use multiple, nonglobal runtimes and to shut down the instance when you're done, I would recommend you look at our embedding API in org.jruby.embed.

@headius headius closed this as completed Oct 31, 2017
@headius
Copy link
Member

headius commented Oct 31, 2017

I'm closing this because there's no evidence that we're leaking fake fileno. If you are seeing leaks in your application, I suspect it's because you're using JSR-223 engines, registering a bunch of new IO objects, and then never cleaning them up.

Your workarounds are:

  • Clean up IOs created during each engine's lifespan before walking away from the engine.
  • Use the JRuby embedding API to isolate the engines and shut them down when done (or make it possible for them to GC), so their resources will be reclaimed.

@headius headius added this to the Invalid or Duplicate milestone Oct 31, 2017
@seanguo
Copy link
Author

seanguo commented Nov 3, 2017

This is used to be fine with 1.7.27. So the implementation changes in 9.x. @headius Can you share more details on your first workaround:

Clean up IOs created during each engine's lifespan before walking away from the engine.

Thanks.

@seanguo
Copy link
Author

seanguo commented Dec 20, 2017

@headius This become a blocking issue for us now. I understood what you said that JSR-223 engines are sharing the same global runtime and which makes the IO objects failed to be garbage collected.
For your first workaround suggestion, I do not think it's easy for us as your API consumer to remember and clean up those IOs before walking away from the engine. In the 1.7.x days we used to use
scriptContext.setAttribute("org.jruby.embed.termination", true, ScriptContext.ENGINE_SCOPE);
to tell the engine to do the clean up after each script execution. Is that possible for you to do the IO objects clean up in the same manner.

And for your second suggestions. It's also not an easy solution for us. We use the JSR223 API as an abstract layer to invoke multiple script engines. Using JRuby's embedding API will make the code much more complicated. Also as you are the guys who understand that part much better, is that possible for you to provider a solution on the JSR223 level and let us isolate the engines/runtime?

Please suggest.

Thanks.

@oozzal
Copy link

oozzal commented Jan 9, 2018

@headius This happened to us as well when upgrading to 9.1.15.0. We had to downgrade to 9.1.12.0 to run without any issue. Here are a couple of heap dump snapshots from jProfiler:
screen shot 2018-01-09 at 8 01 24 am

screen shot 2018-01-09 at 8 07 00 am

@oozzal
Copy link

oozzal commented Jan 11, 2018

@seanguo were you able to find the issue / fix ?

@tcheng
Copy link

tcheng commented Jan 16, 2018

Hi @headius, in addition to @seanguo's comment we use the singlethread context type not the default singleton.

We find the leak issue only happens while using a pool to reuse the JRubyEngine. Here is the snapshot of heap dump we captured,
screen shot 2018-01-16 at 8 17 28 pm

Try to create JRubyEngine every time instead of using pool can fix the leak issue, but introduces some occasional IO errors like below,

Caused by: java.lang.NullPointerException: Inflater has been closed
 at java.util.zip.Inflater.ensureOpen(Inflater.java:389)
 at java.util.zip.Inflater.inflate(Inflater.java:257)
 at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:152)
 at java.io.FilterInputStream.read(FilterInputStream.java:133)
 at java.io.FilterInputStream.read(FilterInputStream.java:107)
 at org.jruby.runtime.load.LoadServiceResourceInputStream.bufferEntireStream(LoadServiceResourceInputStream.java:53)
 at org.jruby.runtime.load.LoadServiceResourceInputStream.<init>(LoadServiceResourceInputStream.java:36)
 at org.jruby.runtime.load.LibrarySearcher$ResourceLibrary.load(LibrarySearcher.java:243)
 at org.jruby.runtime.load.LibrarySearcher$FoundLibrary.load(LibrarySearcher.java:34)
 at org.jruby.runtime.load.LoadService.load(LoadService.java:343)
 at org.jruby.RubyKernel.loadCommon(RubyKernel.java:981)
 at org.jruby.RubyKernel.load19(RubyKernel.java:973)
 at org.jruby.RubyKernel$INVOKER$s$0$1$load19.call(RubyKernel$INVOKER$s$0$1$load19.gen)
 at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:204)
 at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:200)
 at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:338)
 at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:163)
 at org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:314)
 at org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:73)
 at org.jruby.ir.interpreter.Interpreter.INTERPRET_ROOT(Interpreter.java:112)
 at org.jruby.ir.interpreter.Interpreter.execute(Interpreter.java:99)
 at org.jruby.ir.interpreter.Interpreter.execute(Interpreter.java:35)
 at org.jruby.ir.IRTranslator.execute(IRTranslator.java:42)
 at org.jruby.Ruby.runInterpreter(Ruby.java:840)
 at org.jruby.Ruby.loadFile(Ruby.java:2877)
 at org.jruby.runtime.load.ExternalScript.load(ExternalScript.java:58)
 at org.jruby.runtime.load.LoadService.loadFromClassLoader(LoadService.java:371)
 at org.jruby.Ruby.initRubyKernel(Ruby.java:1738)
 at org.jruby.Ruby.init(Ruby.java:1217)
 at org.jruby.Ruby.newInstance(Ruby.java:339)
 at org.jruby.embed.internal.LocalContext.getRuntime(LocalContext.java:117)
 at org.jruby.embed.internal.SingleThreadLocalContextProvider.getRuntime(SingleThreadLocalContextProvider.java:61)
 at org.jruby.embed.internal.BiVariableMap.getRuntime(BiVariableMap.java:109)
 at org.jruby.embed.internal.BiVariableMap.getTopSelf(BiVariableMap.java:241)
 at org.jruby.embed.internal.BiVariableMap.getReceiverObject(BiVariableMap.java:237)
 at org.jruby.embed.internal.BiVariableMap.put(BiVariableMap.java:332)
 at org.jruby.embed.ScriptingContainer.put(ScriptingContainer.java:1185)
 at org.jruby.embed.jsr223.Utils.put(Utils.java:234)
 at org.jruby.embed.jsr223.Utils.preEval(Utils.java:109)
 at org.jruby.embed.jsr223.JRubyEngine.eval(JRubyEngine.java:88)

We have been working on the leak issue couples of days and tried several solutions but still don't have any clue. Look forward to your suggestions. Thanks.

@tcheng
Copy link

tcheng commented Jan 18, 2018

@oozzal , I tried 9.1.12.0 but the memory leak issue still exists in our product. I assume it's because 9.1.12.0 doesn't include the fix for #4446.

@fxposter
Copy link
Contributor

We are see leaks similar to @oozzal and downgrading to 9.1.12.0 actually fixes them for us.

@enebo
Copy link
Member

enebo commented Mar 14, 2018

Two reporters show a leak appearing between 9.1.12 -> .13. Re-opening as regression.

@fxposter
Copy link
Contributor

Clarification - we downgraded from 9.1.16.0

@enebo
Copy link
Member

enebo commented Mar 14, 2018

@fxposter so .13 might not be where the regression started. If you have a chance to see when it started that will be helpful.

@headius
Copy link
Member

headius commented Mar 15, 2018

@fxposter If you can provide a heap dump of a badly-leaked application it would also help us sort out how the leak is happening.

At the moment we have no good way to reproduce this, since all reports are based on user applications we don't have access to.

@tcheng I did not notice your comment until now. This is good information, along with @seanguo's findings. I suspect the problem is due to reusing the same engine repeatedly but registering new Java streams for it to use (e.g. for stdio). I will investigate why we're not ever cleaning up abandoned streams.

@headius
Copy link
Member

headius commented Mar 15, 2018

Ok, after a bit more investigation I don't see anything obvious about fileno registration/unregistration. The fileno gets registered when we initialize an IO instance with a stream, and when the IO is closed or finalized it should be unregistering.

We really need a reproduction.

@mtjhax
Copy link

mtjhax commented Mar 28, 2018

I have created an app that recreates what I think is the same (or a related) problem. Unfortunately, the code causing the memory issue is buried in the AWS SDK v1 gem. I'm stepping into their code to try to get a cleaner repro but it's a little time-consuming so I thought I would share what I have including a zipped heap dump: https://gist.github.com/mtjhax/92df4b277e16adfca951bf3d180c4f1a

Note that I am seeing an issue in 9.1.15.0 and not in any earlier version, but the heap dump is showing retained objects similar to what @oozzal posted.

@headius
Copy link
Member

headius commented Apr 10, 2018

@mtjhax Thanks for the heap dump and efforts to simplify a reproduction! I'll have a look at the dump and see what it tells me.

@headius
Copy link
Member

headius commented Apr 10, 2018

I can definitely see the leaked ChannelFD but I do not see the reason for it yet.

@headius
Copy link
Member

headius commented Apr 10, 2018

I may have found a clue!

It seems that almost all of the leaked ChannelFD are associated with either IOReadableByteChannel or IOWritableByteChannel, which simply wrap an IO object and simulate a readable or writable channel. These objects are constructed in two places: in the Java extension to IO to_channel, which is used to get a Channel for a given IO, and in the following lines of code used by IO#copy_stream: readable, writable.

Notice that in the latter copy_stream cases, we construct the channel, wrap it in an IO, and then immediately set autoclose to false. To preserve the underlying stream, we do not want the wrapper to close it automatically; this causes the temporary IO to not "finish" the ChannelFD it contains, and as a result it never unregisters.

I will investigate this and try to come up with a better option.

@headius
Copy link
Member

headius commented Apr 10, 2018

Oh some clarification of this code, and a question.

First off...the logic I pointed out is used if you are doing copy_stream with an object that is not an IO (copy_stream will use that IO's channel directly) or a String (copy_stream will open a file for you) but which defines "read" or "write" methods.

Does this seem like something your code or libraries you call might be doing? Perhaps somewhere in the AWS libraries?

@headius
Copy link
Member

headius commented Apr 10, 2018

I believe I've reproduced the issue with the following code, which gobbles up memory very quickly for all the registered ChannelFD:

io1 = File.open("pom.xml")
io2 = Object.new
def io2.write(stuff); stuff.size; end
loop { IO.copy_stream(io1, io2) }

@mtjhax
Copy link

mtjhax commented Apr 10, 2018

Does this seem like something your code or libraries you call might be doing? Perhaps somewhere in the AWS libraries?

In my case, whatever is going on is coming out of the AWS v1 SDK gem. Going to poke around the code paths I'm using -- should be possible to find something.

@headius
Copy link
Member

headius commented Apr 10, 2018

It looks like we have a winner. I will land a patch shortly that avoids the use of a temporary IO object for "IO-like" streams, and so avoids the nasty little autoclose/unregister dance altogether.

Given this patch, we can make the following comparison based on my script above.

Before:
image

After:
image

The before case continues to grow and grow while the after case levels off. There are also huge pauses in the before case every time it has to copy a multi-gigabyte heap to make room for the leak.

headius added a commit that referenced this issue Apr 10, 2018
Fixes #4796.

The logic here was wrapping any non-IO, non-String, IO-like object
with a readable or writable channel wrapper, itself wrapped in a
new IO object. The outer IO object was set to not autoclose, to
avoid closing the IO-like thing it contained. However these
semantics do not fit the fully-transient nature of the pseudo-
channel, and caused the ChannelFD created to wrap it to leak in
our FilenoUtil registry.

The new logic cleans up some dead paths, renames some variables
for clarity, and avoids using an IO wrapper when we simply need
a pseduo-channel. This avoids leaking ChannelFD and fixes #4796.
@headius
Copy link
Member

headius commented Apr 10, 2018

Ok, I've pushed a fix to jruby-9.1 branch, which will show up as a stable nightly build here: http://jruby.org/nightly

Those of you having issues can check it out. I've just kicked it to generate a new build right now, so it should be there soon.

Still need to write a test that verifies that IO-like read/write streams passed to copy_stream do not get registered.

@mtjhax
Copy link

mtjhax commented Apr 10, 2018

I thank you sir! And I dearly wish that I had time to actually help out by tracking this down in the sources myself but as you might suspect, I am currently engaged in battle with AWS gems.

@headius
Copy link
Member

headius commented Apr 11, 2018

@mtjhax No worries, just let us know if you can confirm it is fixed.

@seanguo @tcheng @fxposter @oozzal Same goes for you if you still have access to the apps in question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants