Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IO.copy_stream doesn't seem to limit the chunk size #4701

Closed
janko opened this issue Jul 2, 2017 · 4 comments
Closed

IO.copy_stream doesn't seem to limit the chunk size #4701

janko opened this issue Jul 2, 2017 · 4 comments

Comments

@janko
Copy link

janko commented Jul 2, 2017

Environment

$ jruby -v
jruby 9.1.12.0 (2.3.3) 2017-06-15 33c6439 Java HotSpot(TM) 64-Bit Server VM 25.40-b25 on 1.8.0_40-b27 +jit [darwin-x86_64]
$ uname -a
Darwin Jankos-MacBook-Pro-2.local 16.6.0 Darwin Kernel Version 16.6.0: Fri Apr 14 16:21:16 PDT 2017; root:xnu-3789.60.24~6/RELEASE_X86_64 x86_64

Expected Behavior

When I use IO.copy_stream on MRI, it limits the chunk size to 16KB.

require "stringio"

class FakeIO
  def write(data)
    p(data.bytesize)
  end
end

io = StringIO.new(100*1024)

IO.copy_stream(io, FakeIO.new)
16384
16384
16384
16384
16384
16384
4096

Actual Behavior

On JRuby IO.copy_stream appears not to limit the chunk size, it just always reads the entire content at once, because the output of the above script is

102400

I tried with a StringIO up to 10MB of size, and the behaviour is still the same, all 10MB of content is read at once. The behaviour is the same if I change the source IO into a File object.

janko added a commit to janko/http that referenced this issue Jul 2, 2017
IO.copy_stream on JRuby doesn't seem to limit the chunk size, it always
reads the entire content at once.

jruby/jruby#4701

Since we don't care about what is the size of yielded chunks, we just
test whether the yielded chunks sum up to the entire content.

While here we also bring back HTTP::Request::Body#each returning an
Enumerator, so that tests can be simpler.
ixti pushed a commit to httprb/http that referenced this issue Aug 18, 2017
IO.copy_stream on JRuby doesn't seem to limit the chunk size, it always
reads the entire content at once.

jruby/jruby#4701

Since we don't care about what is the size of yielded chunks, we just
test whether the yielded chunks sum up to the entire content.

While here we also bring back HTTP::Request::Body#each returning an
Enumerator, so that tests can be simpler.
@headius
Copy link
Member

headius commented Sep 7, 2017

If you're going from a file to a file we use the builtin Java implementation of FileChannel.transfer, which will do whatever's efficient on that platform. For non-file copying, we do it manually via a loop, and this is likely where we're not choosing a reasonable block size. I'll have a look.

@janko
Copy link
Author

janko commented Nov 13, 2017

This snippet shows that IO.copy_stream in latest JRuby (9.1.14.0) passes nil as the length argument when calling #read on the source IO object, which is the same as not passing the length argument at all and will read the entire content of the IO object into a string.

source      = StringIO.new("foo")
destination = File::NULL

def source.read(*args)
  p args
  super
end

IO.copy_stream(source, destination)
# output on JRuby
[nil]

I think this is problematic because the reason people use IO.copy_stream is because it will copy the contents from source to destination in a memory-efficient way. If JRuby's implementation effectively does destination.write source.read, then for larger files the server could easily run out of memory.

MRI's IO.copy_stream passes both a length argument (of 16KB) and a buffer string to #read, so MRI the output of the previous script is the following:

# output on MRI
[16384, ""]
[16384, "foo"]

To provide background, I'm a maintainer of two libraries which handle file uploads and use IO.copy_stream with non-file IO objects (so JRuby cannot switch to the built-in Java implementation). The second library implements the tus resumable upload protocol, and that one is expected to handle very large file uploads (multiple GB), so I rely on IO.copy_stream for memory-efficient streaming.

@headius
Copy link
Member

headius commented Nov 14, 2017

Fixed along with #4842.

@headius headius closed this as completed Nov 14, 2017
@headius headius added this to the JRuby 9.1.15.0 milestone Nov 14, 2017
@janko
Copy link
Author

janko commented Nov 14, 2017

Awesome, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants