-
-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak off heap on Windows when using File::Stat #3446
Comments
I switch both stat windows structs with threadlocal single instances which would eliminate the allocation of those structs from being a memory leak....The leak still happens! So reviewing 1.7.19->1.7.20 these jnr-related packages have changed:
|
ok good news and bad news. I figured out the leak is due to the addition of the WString type added in 1.7.20. Current stat code has been completely changed to use Windows APIs directly but it also is using WString. So yay we figured out what is leaking... Bad part is WString itself is written fine and jnr-ffi's memory manager seems to be broken by perhaps holding a hard reference to the native string made from WString. This means no pages of memory are ever getting released. I can somewhat fix this by using byte[] directly and not using this helper but I suspect WString is not the only thing in jnr-posix hitting this now.... |
Done working on this for today but we are getting closer to narrowing down where something is holding an extra reference. Somewhat sad to see this has been broken for months but I am guessing Windows has more requirements for special types than the Unixy platforms so people are noticing it a bit less. |
Ok looks like we found a long-standing bug in jffi: https://msdn.microsoft.com/en-us/library/windows/desktop/aa366892%28v=vs.85%29.aspx public void freePages(long address, int pageCount) {
Foreign.VirtualFree(address, (int)this.pageSize() * pageCount, Foreign.MEM_RELEASE);
} This should be: public void freePages(long address, int pageCount) {
Foreign.VirtualFree(address, 0, Foreign.MEM_RELEASE);
} Tip of the hat to @nirvdrum to helping debug this and notice the comment about MEM_RELEASE in the MSDN article. Since we spent more than a day on this I thought I would highlight the journey... By yesterday, we realized that allocating the structs for stat() were not leaking. That only left WString. WString as a parameter to stat was added between 1.7.19 and 1.7.20 so it was likely part of the cause. WString can be a small object. jnr-ffi is smart in that it makes its own virtual paging system and stuffs all these small objects into its own managed pages. This ends up reducing fragmentation. The actual page management itself is very complicated and is heavily dependent on GC + Weak References + Phantom references. Figuring out the paging system was working took quite a while...but it was working. This really only left on other avenue which was that we were not really freeing the native pages. This is what led @nirvdrum to the MSDN page on VirtualFree. So tonight @nirvdrum will confirm the fix works (I have tested on my machine but I think we need a little more confidence). Once he confirms it we will rev jffi and dependent libs and then ask the logstash team to verify the leak is gone. Then we can put out 1.7.23 and hopefully 9.0.4. |
@jsvd https://projectodd.ci.cloudbees.com/view/JRuby/job/jruby-1.7-dist/444/org.jruby$jruby-dist/ Can you test this. It should have the magic for the leak at least. |
@enebo something strange is happening with that build. Now the unusable memory goes up and down so it seems the allocations are being freed correcltly. However, after running the file stat loop for 10-15 minutes (if I have jvisualvm looking at the instance it can happen in 1 minute or 2), the process crashes: I can replicate this quite reliably, but I haven't found a way to extract more information about what's happening to cause this. |
@jsvd I did see it crash once but it was over an hour and a half or so and I was not sure if it was real or not. I did not get an actual crash file either. I will examine GetFileAttributesEx and see if there is some other obscure note I am missing. |
Hi @enebo. Is there any progress on this issue? Do you have an ETA on when it might get fixed? Thanks. |
Hi @astefan ! I think this should be fixed in the 9.0.4.0 release. Have you tried it? |
Hi @donv. We will need this fix in the 1.7.x branch. I guess it should go in 1.7.23. Can you confirm the fix will go into this version as well? |
I am not responsible for doing it, but the fix should go into 1.7.23. It is however not a direct merge, so it has to be reimplemented for the jruby-1_7 branch, so if you are able to test the jruby-1_7 branch to confirm the fix that would ensure the fix is OK for 1.7.23. |
@jsvd I double check it - current jruby-9.0.4.0 and jruby-1_7 are using the same jffi library which is the latest which includes the patch from @enebo. a recent dist file is here https://projectodd.ci.cloudbees.com/view/JRuby/job/jruby-1.7-dist/456/org.jruby$jruby-dist/ but I would say that is the same version you already tested and which crashes the process. |
@mkristian right, I only know that Tom found another way to fix the problem after the previous fix made jruby crash. I don't know if it has reached github yet.. |
@astefan @jsvd @donv @mkristian The current plan is to get 1.7.23 out this coming Monday. We tried to get it out before Rubyconf but we could not get it squeezed in. It should definitely be fixed for stat(). I was able to perform >2 billion stats (I interrupted after that). |
@enebo Looking at the release notes, is this issue the one in the highlight described as "Windows stat no longer leaks memory (broken since 1.7.20)" or didn't it make it into 1.7.23? |
@jakommo yes this was that bullet but it did not make it into the list. I will update on website at least... |
@enebo roger that, thanks for the clarification 👍 |
confirmed 1.7.23 fixes the problem. thanks for the awesome response and fix @enebo |
thanks 👍 |
Where is the commit/PR for this on the 1.7 branch? Can't locate it, debugging #3505 since it breaks JRuby for Windows in our use case. |
This affects all jruby versions since 1.7.20 inclusive:
1.7.11 works OK
1.7.18 works OK
1.7.19 works OK
1.7.20 NOT OK
1.7.22 NOT OK
9.0.3.0 NOT OK
Every time a
File::Stat.new(path)
or aFile.stat(path)
is executed, even if the context disappears there are chunks of memory that are not freed.This can cause, for example, jruby instances to show a healthy jvm heap usage of < 150 mb while the process on windows uses 3gb.
This can be tested by executing
bin\jruby -e " loop { File.stat('bin') }"
and watching in VMMAP (https://technet.microsoft.com/en-us/sysinternals/dd535533.aspx) the amount of unusable memory.In this image collection you can see the normal memory usage of 1.7.19 and the of the non-freed allocations in 1.7.20
The text was updated successfully, but these errors were encountered: