-
-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data read from a UDP socket retains buffer size causing large memory overhead #5148
Comments
This was introduced in commit 388030b. The intention was to avoid always copying the backing array. I personally can see why this is unexpected behavior but I can also see that reverting will affect people who know how much data they will receive (well its UDP but let's pretend it gives more or less what you expect). So I wonder if we should apply a heuristic here? Don't copy if n% of length is filled from the recvfrom (n == 100%/80%???), but if not then call with copy constructor which will right-size the String made. I am looking for opinions. The revert and the heuristic are trivial to do but what is the best way to give the speed benefit while not creating this massive over allocation in a case where you get much less back than you asked for? |
So we discussed and looked into this a bit over hangouts, and have the following findings:
buflen = NUM2INT(len);
str = rsock_strbuf(str, buflen);
At this point I'm leaning toward having a heuristic, where if the requested size is greater than (for example) 1.5x the actual data size, we crop it when returning the string result. However, in both JRuby and CRuby it appears that every call to recvfrom(64k) is going to allocate (and possibly throw away, if we copy) a 64k buffer every time. |
Yeah so the second idea we talked about was potentially doing n recvfroms with a bounded allocated buffer in a loop and building up a string. This felt a bit dubious to us because you are requesting something fairly specific (a single 64k recv_from). Us doing n of those calls for you maybe is a semantic gap too far. As an aside, does UDP really ever reliably read a 64k data packet? I think that is a theoretical limit but one I doubt will work very often. Seems like reducing that size in your code could end up speeding things up because we won't speculatively try and allocate a large buffer for your call. |
It's a matter of safe defaults, we don't want to surprise users by defaulting to truncation at an arbitrary length. As for CRuby doing the same, I can see that both 1.7.27 and cruby 2.4.2 have a much lower memory footprint at the end of the script being executed: Ruby 9k:
JRuby 1.7.27
CRuby 2.4.2
|
@jsvd well that is interesting. Either something we missed in MRI upon making the string originally or something else which is reducing the unused backing store in the COW stuff in MRIs String impl. Yeah I can see the use case that when you don't know what will be arriving you just grab as much as you can. My question was mostly just me remembering how flaky UDP can be. |
@jsvd Hmm I'll have to go over the code in MRI again then. If they're truncating the buffer somewhere, I can't see it. There might be a realloc call hidden in there somewhere hiding the truncation. |
on JRuby 9k, a read from the UDP socket will not truncate unused buffer space so we have to force it using String#s
on JRuby 9k, a read from the UDP socket will not truncate unused buffer space so we have to force it using String#s ensure the cloned string is UTF_8 encoded
I'm not sure if I missed this the first time around (it's clearly in the code) but it seems MRI resizes immediately after the recv: https://github.com/ruby/ruby/blob/master/ext/socket/init.c#L204 Looking into a fix now. |
FWIW I ran into this bug because it's marked for 9.1.18 but never got closed or moved to a new release. |
Ok rb_str_set_len does not resize; it just forces the String's reported size to a specific value. Still looking for how they avoid keeping a 64k string around. |
Yeah I still have no answers. I pinged my favorite CRuby expert @tenderlove to see if he can figure out why MRI doesn't keep 64k buffers around, but otherwise I'm going to proceed with a 1.5x heuristic to copy rather than wrap the buffer. |
This happens on jruby 9k (
not sure about 1.7.x):When this is run, a large part of the heap will be used by 64k RubyStrings containing only "test.#{i}":
This is causing: elastic/logstash#9346
For reference, this doesn't happen on jruby-1.7.27:
The text was updated successfully, but these errors were encountered: