-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve HTTP::ChunkedContent implementation #5928
Conversation
src/http/content.cr
Outdated
@@ -72,6 +72,10 @@ module HTTP | |||
bytes_read = @io.read slice[0, to_read] | |||
@chunk_remaining -= bytes_read | |||
|
|||
if bytes_read < to_read |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What? No you can't assume this! EOF is when bytes_read == 0
. Read could always return exactly 1 byte per call and you have to handle that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You only want to act when bytes_read == 0
(eof) && @chunk_remaining > 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's what I had in peek
already.
ed475eb
to
907b9ce
Compare
Well, to be honest there was nothing to fix: the input is invalid so it raised. Raising an explicit exception is nicer, but if I'd stumble upon them, I wouldn't understand the messages... Maybe something more explicit, such as:
|
@ysbaddaden you're right, I would've trouble to grasp the meaning as well... and I wrote it this morning 🙈 |
@ysbaddaden before, you'd get a nil error if the EOF appeared when parsing the chunk size, but not if you got an EOF in the middle of a chunk (the EOF would just get passed through). This PR changes both to an error, so it is adding new errrors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just remembered we have IO::EOFError
and #read_line
. I think they'd be enough, and even spare the need for a message. What do you think?
src/http/content.cr
Outdated
@@ -134,7 +143,7 @@ module HTTP | |||
|
|||
private def read_chunk_start | |||
read_chunk_end | |||
@chunk_remaining = @io.gets.not_nil!.to_i(16) | |||
read_chunk_remaining |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just remembered that we have read_line
that raises an IO::EOFError
. Wouldn't it be enough to deal with this?
src/http/content.cr
Outdated
|
||
if bytes_read == 0 | ||
raise IO::Error.new("Invalid HTTP chunked content: expected data but got EOF (missing #{@chunk_remaining} more bytes)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should raise an IO::EOFError
(I forgot we had it). I'm not sure it needs a message: exception class says it's an unexpected EOF, backtrace says where it came from, and I don't think the expected remaining byte count is important.
I.e. best effort to report an invalid protocol/broken communication and allow to deal with it (log, and/or silently close) better than a blunt nil error.
5a5fd72
to
fc10b25
Compare
I noticed a few other issues where the implementation was incomplete and added some improvements to comply with RFC 7230.
|
fc10b25
to
e852e56
Compare
src/http/content.cr
Outdated
end | ||
|
||
private def read_delimited_line | ||
line = @io.read_line(16_384, chomp: false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this number 16_384
come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It' 2**14 = 16kiB and a fairly common default value for maximum line length in HTTP header implementations and copied from http/common.cr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh ok, could you add a comment? (and/or put this in a constant?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Constant please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest to do that in a small subsequent PR together with the other usages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have to enforce CRLF. We can rely on IO#read_line
to raise the IO::EOFError
and to already chomp the line. No need to repeat what's existing.
src/http/content.cr
Outdated
# Read "\r\n" | ||
@io.skip(2) | ||
private def read_crlf | ||
bytes = Bytes.new(2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bytes = uninitialized UInt8[2]
Otherwise 2 bytes are allocated on the HEAP memory each time the method is called 😨
src/http/content.cr
Outdated
end | ||
|
||
private def read_delimited_line | ||
line = @io.read_line(16_384, chomp: false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Constant please.
src/http/content.cr
Outdated
|
||
check_chunk_remaining_is_zero | ||
if bytes_read == 0 | ||
raise IO::EOFError.new("Invalid HTTP chunked content: unexpected end of file") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the message is a little redundant? (EOFError ~ unexpected EOF).
src/http/content.cr
Outdated
chunk_size = line | ||
end | ||
|
||
@chunk_remaining = chunk_size.to_i?(16) || raise IO::Error.new("Invalid HTTP chunked content: invalid chunk size (#{chunk_size.dump})") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's avoid allocating a string: "invalid chunk size" is enough, not need to dump the line.
Since we ignore the optional headers, and those are fairly rare, I wonder whether we really have to seek for a potential ;
. Maybe that would suffice:
line.to_i(16, strict: false) { raise IO::Error.new }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's not really commonly used. But I'd leave it in to be RFC compliant. It's implemented anyway.
But I wondered whether string allocation can be avoided... maybe if we could read an int from an IO? This could be useful in other cases as well.
A hex encoded Int32 has 8 digits max (+ sign). So it could be stored in a UInt8[9]
...
@ysbaddaden The RFC explicitly requires CRLF. Some implementations (like Go's) enforce that. But Firefox, Chrome, Edge, curl all accept |
src/http/content.cr
Outdated
# All headers in the trailing headers section will be returned. Applications | ||
# need to make sure to ignore them or fail if headers are not allowed | ||
# in the chunked trailer part (see [RFC 7230 section 4.1.2](https://tools.ietf.org/html/rfc7230#section-4.1.2)). | ||
getter headers : HTTP::Headers = HTTP::Headers.new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it be changed to lazy initialized version?
getter headers : HTTP::Headers { HTTP::Headers.new }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just two minor comments, but it looks great from here.
src/http/content.cr
Outdated
|
||
# Returns trailing headers read by this chunked content. | ||
# | ||
# Thiel value will only be populated once the entire content has been read, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is "thiel" a misspelling here? Or i need to get a new dictionary?
@chunk_remaining = @io.gets.not_nil!.to_i(16) | ||
check_last_chunk | ||
@read_chunk_start = false | ||
private def read_crlf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a spec for allowing just \n
as crlf?
@@ -9,7 +9,7 @@ describe HTTP::ChunkedContent do | |||
bytes_read = content.read(bytes.to_slice) | |||
bytes_read.should eq(4) | |||
String.new(bytes.to_slice).should eq("123\n") | |||
mem.pos.should eq(7) # only this chunk was read | |||
mem.pos.should eq(9) # only this chunk was read |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This spec shouldn't change, see #3270. There's even a nice comment linking to this issue, right next to code you changed later in this PR.
src/http/content.cr
Outdated
@@ -61,7 +61,7 @@ module HTTP | |||
getter headers : HTTP::Headers { HTTP::Headers.new } | |||
|
|||
def initialize(@io : IO) | |||
@chunk_remaining = 0 | |||
@chunk_remaining = -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe use nil
instead as an "invalid value" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only an internal variable so it should be fine to use a special value to determine initial state.
If this should be changed, it would probably make more sense to add an additional ivar like @initial_read
. That would be more explicit. Not that it matters much, but making it nilable would add 12 bytes, a Bool ivar only 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more like 8 bytes for the boolean once you align. We should do the ivar reordering/packing that rust does eventually...
@RX14 The issue from #3270 was not compromised, the example provided there already worked with the code before the last commit. For the matter of that issue, it's irrelevant if CRLF at the end of the chunk is read immediately or not (i.e. 7 or 9 bytes) because it is already sent together with the chunk data anyway. The issue in #3270 (which was fixed in 4e00257) was, when a chunk was consumed, it would read CRLF plus the entire following line - that's 12 bytes (7 + CRLF + 0 + CRLF) in the spec example. Obviously, this would delay receiving a chunk until the chunk size of the following chunk is sent. As long as a completely sent chunk is immediately followed by CRLF, there is no issue. And I figured it should only be treated as a valid chunk if it is properly delimited by CRLF. But how to handle this is not clearly stated in the RFC. Similarly as we will properly accept CRLF in place of LF it probably makes sense to not strictly require a complete chunk to be delimited but accept it without progressing forward in the stream. In golang/go#17355 this is explained that the read method should not have to wait once it has received some data it can deliver to the sender. |
That's not an assumption I want to make. Bad implementations won't do this I'm certain. We should always return data when |
Agreed. That's changed in the implementation now. |
Oh, sorry, I missed that commit. |
Reading from
HTTP::ChunkedContent
now raisesIO::Error
if the IO ends before the entire size of a chunk is read.Fixes #5852