IO.copy_stream doesn't seem to limit the chunk size #4701

janko · 2017-07-02T10:22:08Z

Environment

$ jruby -v
jruby 9.1.12.0 (2.3.3) 2017-06-15 33c6439 Java HotSpot(TM) 64-Bit Server VM 25.40-b25 on 1.8.0_40-b27 +jit [darwin-x86_64]
$ uname -a
Darwin Jankos-MacBook-Pro-2.local 16.6.0 Darwin Kernel Version 16.6.0: Fri Apr 14 16:21:16 PDT 2017; root:xnu-3789.60.24~6/RELEASE_X86_64 x86_64

Expected Behavior

When I use IO.copy_stream on MRI, it limits the chunk size to 16KB.

require "stringio"

class FakeIO
  def write(data)
    p(data.bytesize)
  end
end

io = StringIO.new(100*1024)

IO.copy_stream(io, FakeIO.new)

Actual Behavior

On JRuby IO.copy_stream appears not to limit the chunk size, it just always reads the entire content at once, because the output of the above script is

I tried with a StringIO up to 10MB of size, and the behaviour is still the same, all 10MB of content is read at once. The behaviour is the same if I change the source IO into a File object.

The text was updated successfully, but these errors were encountered:

IO.copy_stream on JRuby doesn't seem to limit the chunk size, it always reads the entire content at once. jruby/jruby#4701 Since we don't care about what is the size of yielded chunks, we just test whether the yielded chunks sum up to the entire content. While here we also bring back HTTP::Request::Body#each returning an Enumerator, so that tests can be simpler.

headius · 2017-09-07T22:17:38Z

If you're going from a file to a file we use the builtin Java implementation of FileChannel.transfer, which will do whatever's efficient on that platform. For non-file copying, we do it manually via a loop, and this is likely where we're not choosing a reasonable block size. I'll have a look.

janko · 2017-11-13T10:50:15Z

This snippet shows that IO.copy_stream in latest JRuby (9.1.14.0) passes nil as the length argument when calling #read on the source IO object, which is the same as not passing the length argument at all and will read the entire content of the IO object into a string.

source      = StringIO.new("foo")
destination = File::NULL

def source.read(*args)
  p args
  super
end

IO.copy_stream(source, destination)

# output on JRuby
[nil]

I think this is problematic because the reason people use IO.copy_stream is because it will copy the contents from source to destination in a memory-efficient way. If JRuby's implementation effectively does destination.write source.read, then for larger files the server could easily run out of memory.

MRI's IO.copy_stream passes both a length argument (of 16KB) and a buffer string to #read, so MRI the output of the previous script is the following:

# output on MRI
[16384, ""]
[16384, "foo"]

To provide background, I'm a maintainer of two libraries which handle file uploads and use IO.copy_stream with non-file IO objects (so JRuby cannot switch to the built-in Java implementation). The second library implements the tus resumable upload protocol, and that one is expected to handle very large file uploads (multiple GB), so I rely on IO.copy_stream for memory-efficient streaming.

headius · 2017-11-14T23:24:26Z

Fixed along with #4842.

janko · 2017-11-14T23:32:51Z

Awesome, thank you!

janko mentioned this issue Jul 2, 2017

Further upload improvements httprb/http#418

Closed

janko mentioned this issue Nov 13, 2017

Support large files, reduce memory usage carrierwaveuploader/carrierwave-aws#116

Merged

headius closed this as completed Nov 14, 2017

headius added this to the JRuby 9.1.15.0 milestone Nov 14, 2017

HoneyryderChuck mentioned this issue Dec 18, 2017

Weird error using IO.copy_stream, IO duck types and enumerators #4903

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Sponsors

IO.copy_stream doesn't seem to limit the chunk size #4701

IO.copy_stream doesn't seem to limit the chunk size #4701

janko commented Jul 2, 2017 •

edited

Loading

headius commented Sep 7, 2017

janko commented Nov 13, 2017 •

edited

Loading

headius commented Nov 14, 2017

janko commented Nov 14, 2017

IO.copy_stream doesn't seem to limit the chunk size #4701

IO.copy_stream doesn't seem to limit the chunk size #4701

Comments

janko commented Jul 2, 2017 • edited Loading

Environment

Expected Behavior

Actual Behavior

headius commented Sep 7, 2017

janko commented Nov 13, 2017 • edited Loading

headius commented Nov 14, 2017

janko commented Nov 14, 2017

janko commented Jul 2, 2017 •

edited

Loading

janko commented Nov 13, 2017 •

edited

Loading