Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve HTTP::ChunkedContent implementation #5928

Merged
merged 11 commits into from Apr 11, 2018

Conversation

straight-shoota
Copy link
Member

@straight-shoota straight-shoota commented Apr 7, 2018

Reading from HTTP::ChunkedContent now raises IO::Error if the IO ends before the entire size of a chunk is read.

Fixes #5852

@@ -72,6 +72,10 @@ module HTTP
bytes_read = @io.read slice[0, to_read]
@chunk_remaining -= bytes_read

if bytes_read < to_read
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What? No you can't assume this! EOF is when bytes_read == 0. Read could always return exactly 1 byte per call and you have to handle that case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh.

Copy link
Contributor

@RX14 RX14 Apr 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You only want to act when bytes_read == 0 (eof) && @chunk_remaining > 0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's what I had in peek already.

@ysbaddaden
Copy link
Contributor

Well, to be honest there was nothing to fix: the input is invalid so it raised.

Raising an explicit exception is nicer, but if I'd stumble upon them, I wouldn't understand the messages... Maybe something more explicit, such as:

  • "invalid http chunked content: expected chunk data but got EOF"
  • "invalid http chunked content: expected chunk size but got EOF"

@straight-shoota
Copy link
Member Author

@ysbaddaden you're right, I would've trouble to grasp the meaning as well... and I wrote it this morning 🙈

@RX14
Copy link
Contributor

RX14 commented Apr 7, 2018

@ysbaddaden before, you'd get a nil error if the EOF appeared when parsing the chunk size, but not if you got an EOF in the middle of a chunk (the EOF would just get passed through). This PR changes both to an error, so it is adding new errrors.

Copy link
Contributor

@ysbaddaden ysbaddaden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just remembered we have IO::EOFError and #read_line. I think they'd be enough, and even spare the need for a message. What do you think?

@@ -134,7 +143,7 @@ module HTTP

private def read_chunk_start
read_chunk_end
@chunk_remaining = @io.gets.not_nil!.to_i(16)
read_chunk_remaining
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just remembered that we have read_line that raises an IO::EOFError. Wouldn't it be enough to deal with this?


if bytes_read == 0
raise IO::Error.new("Invalid HTTP chunked content: expected data but got EOF (missing #{@chunk_remaining} more bytes)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should raise an IO::EOFError (I forgot we had it). I'm not sure it needs a message: exception class says it's an unexpected EOF, backtrace says where it came from, and I don't think the expected remaining byte count is important.

I.e. best effort to report an invalid protocol/broken communication and allow to deal with it (log, and/or silently close) better than a blunt nil error.

@straight-shoota
Copy link
Member Author

I noticed a few other issues where the implementation was incomplete and added some improvements to comply with RFC 7230.

  • raises IO::EOFError if a chunk cannot be completely read (I added a specific error message, I figure it doesn't hurt to be specific).
  • the constructor won't start reading the parent IO and thus no longer raises. Chunks will only be consumed when a read method is called.
  • chunk extensions (parameters after chunk size, separated by ;) are recognized but ignored.
  • trailing headers after the final chunk delimiter \0\r\n are parsed and can be accessed as #headers.

@straight-shoota straight-shoota changed the title Fix HTTP::ChunkedContent read incomplete chunks Improve HTTP::ChunkedContent implementation Apr 8, 2018
end

private def read_delimited_line
line = @io.read_line(16_384, chomp: false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this number 16_384 come from?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It' 2**14 = 16kiB and a fairly common default value for maximum line length in HTTP header implementations and copied from http/common.cr.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok, could you add a comment? (and/or put this in a constant?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constant please.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to do that in a small subsequent PR together with the other usages.

Copy link
Contributor

@ysbaddaden ysbaddaden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have to enforce CRLF. We can rely on IO#read_line to raise the IO::EOFError and to already chomp the line. No need to repeat what's existing.

# Read "\r\n"
@io.skip(2)
private def read_crlf
bytes = Bytes.new(2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bytes = uninitialized UInt8[2]

Otherwise 2 bytes are allocated on the HEAP memory each time the method is called 😨

end

private def read_delimited_line
line = @io.read_line(16_384, chomp: false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constant please.


check_chunk_remaining_is_zero
if bytes_read == 0
raise IO::EOFError.new("Invalid HTTP chunked content: unexpected end of file")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the message is a little redundant? (EOFError ~ unexpected EOF).

chunk_size = line
end

@chunk_remaining = chunk_size.to_i?(16) || raise IO::Error.new("Invalid HTTP chunked content: invalid chunk size (#{chunk_size.dump})")
Copy link
Contributor

@ysbaddaden ysbaddaden Apr 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid allocating a string: "invalid chunk size" is enough, not need to dump the line.

Since we ignore the optional headers, and those are fairly rare, I wonder whether we really have to seek for a potential ;. Maybe that would suffice:

line.to_i(16, strict: false) { raise IO::Error.new }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's not really commonly used. But I'd leave it in to be RFC compliant. It's implemented anyway.

But I wondered whether string allocation can be avoided... maybe if we could read an int from an IO? This could be useful in other cases as well.
A hex encoded Int32 has 8 digits max (+ sign). So it could be stored in a UInt8[9]...

@straight-shoota
Copy link
Member Author

@ysbaddaden The RFC explicitly requires CRLF. Some implementations (like Go's) enforce that. But Firefox, Chrome, Edge, curl all accept \n as well, so it's probably okay to be not totally strict.

# All headers in the trailing headers section will be returned. Applications
# need to make sure to ignore them or fail if headers are not allowed
# in the chunked trailer part (see [RFC 7230 section 4.1.2](https://tools.ietf.org/html/rfc7230#section-4.1.2)).
getter headers : HTTP::Headers = HTTP::Headers.new
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be changed to lazy initialized version?

getter headers : HTTP::Headers { HTTP::Headers.new }

Copy link
Member

@bcardiff bcardiff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two minor comments, but it looks great from here.


# Returns trailing headers read by this chunked content.
#
# Thiel value will only be populated once the entire content has been read,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is "thiel" a misspelling here? Or i need to get a new dictionary?

@chunk_remaining = @io.gets.not_nil!.to_i(16)
check_last_chunk
@read_chunk_start = false
private def read_crlf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a spec for allowing just \n as crlf?

@@ -9,7 +9,7 @@ describe HTTP::ChunkedContent do
bytes_read = content.read(bytes.to_slice)
bytes_read.should eq(4)
String.new(bytes.to_slice).should eq("123\n")
mem.pos.should eq(7) # only this chunk was read
mem.pos.should eq(9) # only this chunk was read
Copy link
Contributor

@RX14 RX14 Apr 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This spec shouldn't change, see #3270. There's even a nice comment linking to this issue, right next to code you changed later in this PR.

@@ -61,7 +61,7 @@ module HTTP
getter headers : HTTP::Headers { HTTP::Headers.new }

def initialize(@io : IO)
@chunk_remaining = 0
@chunk_remaining = -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use nil instead as an "invalid value" ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only an internal variable so it should be fine to use a special value to determine initial state.
If this should be changed, it would probably make more sense to add an additional ivar like @initial_read. That would be more explicit. Not that it matters much, but making it nilable would add 12 bytes, a Bool ivar only 1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more like 8 bytes for the boolean once you align. We should do the ivar reordering/packing that rust does eventually...

@straight-shoota
Copy link
Member Author

straight-shoota commented Apr 9, 2018

@RX14 The issue from #3270 was not compromised, the example provided there already worked with the code before the last commit. For the matter of that issue, it's irrelevant if CRLF at the end of the chunk is read immediately or not (i.e. 7 or 9 bytes) because it is already sent together with the chunk data anyway.

The issue in #3270 (which was fixed in 4e00257) was, when a chunk was consumed, it would read CRLF plus the entire following line - that's 12 bytes (7 + CRLF + 0 + CRLF) in the spec example. Obviously, this would delay receiving a chunk until the chunk size of the following chunk is sent.

As long as a completely sent chunk is immediately followed by CRLF, there is no issue. And I figured it should only be treated as a valid chunk if it is properly delimited by CRLF. But how to handle this is not clearly stated in the RFC. Similarly as we will properly accept CRLF in place of LF it probably makes sense to not strictly require a complete chunk to be delimited but accept it without progressing forward in the stream.

In golang/go#17355 this is explained that the read method should not have to wait once it has received some data it can deliver to the sender.

@RX14
Copy link
Contributor

RX14 commented Apr 10, 2018

's irrelevant if CRLF at the end of the chunk is read immediately or not (i.e. 7 or 9 bytes) because it is already sent together with the chunk data anyway.

That's not an assumption I want to make. Bad implementations won't do this I'm certain. We should always return data when @chunk_remaining == 0, without waiting for the trailing crlf.

@straight-shoota
Copy link
Member Author

Agreed. That's changed in the implementation now.

@RX14
Copy link
Contributor

RX14 commented Apr 10, 2018

Oh, sorry, I missed that commit.

@RX14 RX14 added this to the Next milestone Apr 11, 2018
@RX14 RX14 merged commit c541aae into crystal-lang:master Apr 11, 2018
@straight-shoota straight-shoota deleted the jm/fix/5852 branch April 11, 2018 06:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants