-
-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weird error using IO.copy_stream, IO duck types and enumerators #4903
Comments
How peculiar! I'll have a look. |
Really? I'd expect something else, I can reproduce it very consistently. How big was your image? |
There's definitely something strange going on. With some larger input I am seeing something similar to what you reported, but it doesn't make any sense. The chunk produced by |
yes, when I mean "consistently", I mean "repeating task and seeing it fail quite often". I'm not sure what I should attribute this to, but I'd say it is some internal buffering issue. My problem is "scaling down" my example to know where this actually happens, as if I try to simplify the code path, suddenly everything works flawlessly. |
I still do not have an explanation for this. Poking at it a bit this afternoon. |
Ok, I think I've figured out the issue. Our Enumerator#next is frequently (usually) backed by a thread, since we do not have a way to do coroutines (e.g. lightweight Fibers) on the JVM. The logic for this thread should pause each time through the loop, waiting for the next item to be requested. In actuality, I believe it is immediately continuing to the next loop, which in this case results in the returned buffer getting overwritten before it can even be dup'ed. I am trying to confirm this by examining logic in |
Due to a bug in how Enumerator#next progresses (it will run an iteration ahead of requested) this bug was exposed as #4903 where an Enumerator#next-based sink for IO.copy_stream showed previously-returned results getting modified after handoff. The same buffer array was being shared across all chunks written, which works ok if that view were only used within the confines of copy_stream, but in this case the chunks were held across more than a single write. Fixes #4903. See #5007 for the Enumerator#next bug.
I've pushed a fix for the issue causing the data to be overwritten after it is returned. The triggering issue, Enumerator#next not waiting for a subsequent call to continue iterating, is in #5007 and will probably be fixed in next major release. |
nice, thx for the fix! I'll test it as soon as I can. |
@headius just tested this, and I can confirm the fix. Thx again for the top-level support! |
When the incoming bytes are on the heap, we can use them in-place for the string we pass to the Ruby write method. However, since the target IO-like may want to modify the incoming string, we must make sure it is marked as shared so our original src bytes will not be modified. This issue originally manifested in jruby#4903 and was fixed by always copying the incoming buffer, but marking the string shared has the same effecit if the target attempts to make any modifications.
When the incoming bytes are on the heap, we can use them in-place for the string we pass to the Ruby write method. However, since the target IO-like may want to modify the incoming string, we must make sure it is marked as shared so our original src bytes will not be modified. This issue originally manifested in jruby#4903 and was fixed by always copying the incoming buffer, but marking the string shared has the same effecit if the target attempts to make any modifications.
Environment
http-form_data
, and with a jpg image (preferably with 46K)Expected Behavior
I have a very similar code to the one from this sample:
(The
puts
calls are to debug and show the error)The purpose of this code is to enumerate the
IO.copy_stream
call, so that its chunks can be managed inside the while block. This code works in MRI (tested with 2.4).The main difference in implementation is that in MRI,
IO.copy_stream
yields chunks of 16384 bytes, while JRuby yields 8192 bytes. I've followed this into this ticket, which leads me to believe that I can't reproduce this bug in older versions of jruby (as they were buffering the source in memory).If you limit the debug statements to
1.
, you'll see these outputs.In the end, the total bytes yielded in both solutions is similar. The gist of it is, the drained chunk must be equal to the yielded chunk.
However, if you limit the debug statements to
2.
, you'll see that this is not the case in JRuby:(check the 3rd yield)
Actual Behavior
As stated, I expect the pairs to be the same all the time.
I couldn't single out exactly what is the problem (The File buffer, the IO.copy_stream call, the enumeration...), and had to completely reproduce my usage to create this short script. But it's definitely a bug.
The text was updated successfully, but these errors were encountered: