Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very large retained heap size for org.jruby.RubyRegexp$RegexpCache in JRuby Rails App #2078

Closed
AshMcConnell opened this issue Oct 29, 2014 · 8 comments
Assignees
Labels
Milestone

Comments

@AshMcConnell
Copy link

I've asked a question in stackoverflow, but I'll paste here for convenience

"We have analysed a heap dump file for our application (running on Tomcat with jruby 1.7.8).

It shows us that the retained heap size is very large (439,459,128) for the class org.jruby.RubyRegexp$RegexpCache. This is 48% of our memory usage

Looking at the source code for that file it is 3 final static object created at startup (patternCache / quotedPatternCache / preprocessedPatternCache)

This seems to be a pretty core part of JRuby. My question is, is it normal to have such a large percentage of the heap to be dedicated to this cache?
"

http://stackoverflow.com/questions/26630565/very-large-retained-heap-size-for-org-jruby-rubyregexpregexpcache-in-jruby-rail

Thanks for your help
All the best,
Ash

@AshMcConnell
Copy link
Author

We've tested this out and let it run until destruction. The regexp cache fills up the heap and the transaction time increases and increases (90 secs instead of 1-2 secs). Is there any way to limit the size of this cache?

@yousuketto
Copy link
Contributor

same thing happened in my environment.

repro code
# example.rb
count = 0
while(true)
  count += 1
  Regexp.new("test#{count}")
end
c:\workspace\sandbox\reg-cache>jruby -v
jruby 1.7.16.1 (1.9.3p392) 2014-10-28 4e93f31 on Java HotSpot(TM) Client VM 1.7.0_17-b02 +jit [Windows 7-x86]

c:\workspace\sandbox\reg-cache>jruby -J-verbose:gc -J-XX:+HeapDumpOnOutOfMemoryError example.rb
[GC 4412K->1804K(15872K), 0.0141832 secs]
[GC 6220K->2974K(15872K), 0.0181254 secs]
[GC 7390K->4579K(15872K), 0.0201395 secs]
[GC 8995K->5779K(15872K), 0.0158137 secs]
[GC 10195K->6735K(15872K), 0.0126515 secs]
...
[Full GC 494975K->494974K(494976K), 0.7089635 secs]
[Full GC 494975K->494975K(494976K), 0.8630243 secs]
[Full GC 494975K->494975K(494976K), 0.9114314 secs]
[Full GC 494975K->494975K(494976K), 0.9192651 secs]
[Full GC 494975K->494975K(494976K), 0.8271494 secs]
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid7940.hprof ...
Heap dump file created [526303819 bytes in 8.361 secs]
[Full GC 494975K->6176K(494976K), 0.1836955 secs]
[Full GC 64846K->6187K(494976K), 0.0662034 secs]
Error: Your application used more memory than the safety cap of 500M.
Specify -J-Xmx####m to increase it (#### = cap size in MB).
Specify -w for full OutOfMemoryError stack trace

@AshMcConnell
Copy link
Author

We have found the source of the problem, it was slightly more hidden than your reproduction. We were doing a substitution (when trying to filter params) that happened on every transaction.

config.filter_parameters += known_sensitive_fields << lambda { |key, value|
      if value.present? && API::ExpectedFields.request_keys.exclude?(key) then
        value.sub!(value, "[FILTERED]") unless API::ExpectedFields.is_dynamic?(key)
      end
    }

The value.sub! in there was the problem, the value was changing with every request and being added to the Regex Cache. We fixed the problem by using value.replace("[FILTERED]") instead - no more leak.

I do think that the cache entries should perhaps have a TTL or a maximum size to avoid something like this.

@cheald
Copy link
Contributor

cheald commented Nov 13, 2014

I can't reproduce this on 1.7.16.1 or master - the SoftReferences get collected reliably.

Using:

count = 0
while(true)
  count += 1
  Regexp.new("test#{count}" * 500000)
end

Master:

1.7.16.1:

$ java -version
java version "1.7.0_45"
OpenJDK Runtime Environment (fedora-2.4.3.0.fc18-x86_64 u45-b15)
OpenJDK 64-Bit Server VM (build 24.45-b08, mixed mode)

@yousuketto
Copy link
Contributor

@cheald
Thank you for confirmation.

It seems GC collect except system to access to RegexpCache at high frequency.

  • "access to RegexpCache at high frequency": ex) when interval time to make Regexp object is shorter than -XX:SoftRefLRUPolicyMSPerMB value

I think, reason to not reproduce at cheald's code is interval time to make Regexp object becomes longer at cheald' code.

In actuary, "access to RegexpCache at high frequency" does not occur nearly.
So, no problem at my system.

thank you.

@headius
Copy link
Member

headius commented Nov 13, 2014

Thank you for reporting this. Yes, this implementation is insufficient. We are exploring another.

cheald added a commit to cheald/jruby that referenced this issue Nov 13, 2014
…ache

keys are shorter-lived than the SoftReferences used before, but it will
allow for the rapid expiration of memory-hungry one-shot regexes as well.

Fixes jruby#2078
@AshMcConnell
Copy link
Author

Thanks guys - really glad that our 2 weeks of pain has at least resulted in JRuby being improved. Thanks for your hard work!

@headius
Copy link
Member

headius commented Nov 23, 2014

Should be fixed now in jruby-1_7 and master. The new cache appears to run @cheald's benchmark 3x faster than MRI, 2x faster than Rubinius, and retained heap holds steady.

Not sure how best to write a test for this. Will try to add one into the JUnit suite.

@headius headius closed this as completed Nov 23, 2014
@headius headius added this to the JRuby 1.7.17 milestone Nov 23, 2014
@headius headius added the core label Nov 23, 2014
@headius headius self-assigned this Nov 23, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants