Small string values backed by HUGE non-shared ByteList leading to large memory usage under 9k #3019

bbrowning · 2015-06-04T20:02:37Z

After working through this problem a bit in IRC, we've come up with a fairly simple reproduction script here:

https://gist.github.com/bbrowning/90c2296e048fb806b4a1

On JRuby 9k, that script will OOM with the default 512MB heap. Under JRuby 1.7.20, I can run it with only a 16MB heap without issues. This script basically parses a string and creates an array of arrays with the parsed data. You end up with something like:

[[:MSGID, msgid],[:STRING, "Translation 1"],[:MSGID, msgid],[:STRING, "Translation 2"]...]

When looking at a heap dump, those "Translation X" strings are RubyStrings backed by unique ByteLists where each ByteList has a byte[] array containing the entirety of remaining str and then an offset and length pointing to just a tiny portion of the byte array.

So, it's like the ByteLists think they're being shared here but they actually are not being shared, ending up with multiple copies of very large ByteLists. They either need to be shared or not shared but pruned to not contain the excess string data.

The text was updated successfully, but these errors were encountered:

bbrowning · 2015-06-05T15:02:22Z

The following changes get rid of the memory issue on current JRuby master. I doubt it's the right fix, but it at least narrows down the issue.

diff --git a/core/src/main/java/org/jruby/RubyRegexp.java b/core/src/main/java/org/jruby/RubyRegexp.java
index 0d9cfa2..7c767be 100644
--- a/core/src/main/java/org/jruby/RubyRegexp.java
+++ b/core/src/main/java/org/jruby/RubyRegexp.java
@@ -1245,7 +1245,7 @@ public class RubyRegexp extends RubyObject implements ReOptions, EncodingCapable
         }

         if (setBackrefStr) {
-            ((RubyMatchData)match).str = str.newFrozen();
+            ((RubyMatchData)match).str = str;
             match.infectBy(str);
         }

diff --git a/core/src/main/java/org/jruby/RubyString.java b/core/src/main/java/org/jruby/RubyString.java
index ca7af16..5eee9c5 100644
--- a/core/src/main/java/org/jruby/RubyString.java
+++ b/core/src/main/java/org/jruby/RubyString.java
@@ -5356,7 +5356,6 @@ public class RubyString extends RubyObject implements EncodingCapable, MarshalEn
     @JRubyMethod
     public IRubyObject freeze(ThreadContext context) {
         if (isFrozen()) return this;
-        resize(size());
         return super.freeze(context);
     }

This fixes jruby#3019, where in certain cases it's possible to end up with RubyString instances backed by very large unique ByteLists where only a tiny portion of the bytes in the ByteList are actually needed. What was happening is `modify` was being called on RubyString instances too eagerly, resulting in unnecessary duplication of their underlying ByteList instances instead of sharing when possible. The two changes here are in `RubyString#newFrozen` and `RubyString#resize` and I believe are more correct based on the C implementation of these methods. For `RubyString#newFrozen`, which roughly corresponds to `rb_str_new_frozen` in C (https://github.com/ruby/ruby/blob/v2_2_2/string.c#L932), the logic looks to me to return the shared string and I don't see it calling `str_make_independent`, which is roughly the same thing as `RubyString#modify`. So, I just removed the call to `modify`. For `RubyString#resize`, which roughly corresponds to to `rb_str_resize` in C, shared strings are only made independent if their length differ (https://github.com/ruby/ruby/blob/v2_2_2/string.c#L2155). Thus, instead of unconditionally calling `modify` here, I moved it into the conditions that are true when the length differs.

enebo added this to the JRuby 9.0.0.0.rc1 milestone Jun 4, 2015

enebo added the JRuby 9000 label Jun 4, 2015

bbrowning mentioned this issue Jun 5, 2015

Don't call modify() on RubyStrings so eagerly when freezing them #3022

Merged

enebo closed this as completed in #3022 Jun 5, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Sponsors

Small string values backed by HUGE non-shared ByteList leading to large memory usage under 9k #3019

Small string values backed by HUGE non-shared ByteList leading to large memory usage under 9k #3019

bbrowning commented Jun 4, 2015

bbrowning commented Jun 5, 2015

Small string values backed by HUGE non-shared ByteList leading to large memory usage under 9k #3019

Small string values backed by HUGE non-shared ByteList leading to large memory usage under 9k #3019

Comments

bbrowning commented Jun 4, 2015

bbrowning commented Jun 5, 2015