Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory/disk usage leak when creating many ScriptingContaiers #4747

Conversation

jpinsonault
Copy link
Contributor

This PR is to resolve #3928, File descriptors from stdlib jars are leaked when ScriptingContainers are terminated

It's very much a work in progress, and I was hoping to get some feedback on

  1. If this is close to a reasonable approach
  2. What we would need to do to move this closer to a good solution
  3. What kind of testing/validation you'd like to see to be comfortable with this change
  4. Is cleaning up these files even possible/desirable when the scripting container isn't using a single threaded scope. i.e. threadsafe/concurrent/singleton. They seem to share references and I'm not sure how or if their use cases should be handled.

This biggest problem I currently see with it is that I was unable to find a way to cleanup the jars and references without making the JarResource class public. Since I'm not very experienced with this codebase, I'm especially interested in your opinions on how to clean up the jars in a cleaner way.


The basic strategy here is:

  1. In JRubyClassLoader: When jars are extracted to a temporary folder, keep track of their paths
  2. When a single threaded scripting container is terminated, its SingleThreadLocalContextProvider based context provider goes through each of the temp jars, and cleans up the jarCache references and deletes the jars from disk

In the case of a single threaded container, this should be safe. My understanding is that each single threaded container has its own class loader and context provider. Additionally, each time a container causes the class loader to extract a stdlib jar into a temp directory, it creates a uniquely named file for that class loader. So, removing these references will not affect any other scripting containers.

Since our use case in #3928 primarily involves using the single threaded local context scope, I've focused on that. But if this work should involve improvements to the other context providers we'd be happy to implement that as well.

So far I've been testing this manually by just creating scripting containers in a loop, giving them a script that requires a few stdlib jars, and verifying that the jars are created and destroyed on disk as expected. I've also been using yourkit to verify that the jar cache doesn't leak references

@@ -160,6 +166,10 @@ private static String systemTmpDir() {
return "";
}

public List<URL> getTempUrls(){
Copy link
Member

@kares kares Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be great to have the naming a bit consistent with existing API e.g. addURL(URL)
... getCachedURLs() or maybe indicate that there's temp file caching (that those URLs are actually local files)
in which case we should rather return paths than URLs, makes sense (getTempPaths or getCachedPaths)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense. What about cachedJarPaths and getCachedJarPaths? Or is that too verbose?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks much better than the current naming ... go for whatever you feel is appropriate.
getter sounds fine to me esp. if its simply returning a field

Since these are always paths to files, cache the paths instead of URLs
Also rename tempUrls to cachedJarPaths
@jpinsonault
Copy link
Contributor Author

I pushed a commit changing the variable name to cachedJarPaths, and changed it to hold String paths instead of URLs

@kares
Copy link
Member

kares commented Aug 29, 2017

looking good, is it still a WiP?
did you consider having a try {} catch around the jarFile.delete, to silence IOExceptions, would it make sense to not halt the termination in case of not being able to delete (e.g. file deleted previously) ?

@@ -86,6 +90,8 @@ public void addURL(URL url) {
out.close();
in.close();
url = f.toURI().toURL();

cachedJarPaths.add(url.getPath());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure how much its relevant but URL had issues - hopefully not for file: protocol but it might be worth a check
... why org.jruby.util.URLUtil.getPath exists and whether it should be used here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into the issue, and from what I was able to understand, people recommend using Paths/Path objects if you're on java 7 or higher. Does jruby need to support java 6?

I also saw recommendations to use url.toURI().getSchemeSpecificPart() to get the String path, which is what URLUtil.getPath does, so it might be easier, and safe, to just use that utility function here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, feel free to update to using URLUtil than. also JRuby 9 only supported Java >= 7

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@jpinsonault
Copy link
Contributor Author

If it's looking good to you then I guess it's not a WIP.

According to https://docs.oracle.com/javase/7/docs/api/java/io/File.html#delete(), File.delete doesn't throw an IOException, and just returns false if the file couldn't be deleted. Which seems like fine behavior to me. I tried putting it in a try catch with an IOException and the compiler also complains that it will never throw that exception

@jpinsonault jpinsonault changed the title [Work in progress] Fix memory/disk usage leak when creating many ScriptingContaiers Fix memory/disk usage leak when creating many ScriptingContaiers Aug 29, 2017
@kares
Copy link
Member

kares commented Sep 2, 2017

PR looking good, but I do not understand why its only tied to SingleThreadLocalContextProvider,
you just do not care about the other context-providers cause that is the one you're using?
but if this to be shipped it could be done for all, or any reason no to have the same cleanup there?

@kares kares added this to the JRuby 9.2.0.0 milestone Sep 2, 2017
@kares kares added the embed label Sep 2, 2017
@headius
Copy link
Member

headius commented Oct 31, 2017

@jpinsonault @kares This is probably something that would be good to put in 9.1.14, right? Can we land it yet?

@headius
Copy link
Member

headius commented Oct 31, 2017

cc @enebo

@enebo
Copy link
Member

enebo commented Oct 31, 2017

@headius @jpinsonault @kares if this is good on a single provider and solid I say it is better than nothing. Expecting all providers to be done would be great but this will make .14 better... SHIP IT! :)

@jpinsonault If this can be done in the other providers are you up for adding it to those too?

@headius
Copy link
Member

headius commented Oct 31, 2017

I will merge this and extend it to the other providers.

@headius
Copy link
Member

headius commented Oct 31, 2017

I believe it makes most sense to move this into JRubyClassLoader's close/teardown logic, which is already called by ScriptingContainer.terminate. I'll do that and get everything merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

File descriptors from stdlib jars are leaked when ScriptingContainers are terminated
4 participants