Improve HTTP::ChunkedContent implementation #5928

straight-shoota · 2018-04-07T10:06:19Z

Reading from HTTP::ChunkedContent now raises IO::Error if the IO ends before the entire size of a chunk is read.

RX14 · 2018-04-07T10:28:28Z

src/http/content.cr

@@ -72,6 +72,10 @@ module HTTP
      bytes_read = @io.read slice[0, to_read]
      @chunk_remaining -= bytes_read

+      if bytes_read < to_read


What? No you can't assume this! EOF is when bytes_read == 0. Read could always return exactly 1 byte per call and you have to handle that case.

You only want to act when bytes_read == 0 (eof) && @chunk_remaining > 0.

Yeah, that's what I had in peek already.

ysbaddaden · 2018-04-07T16:51:06Z

Well, to be honest there was nothing to fix: the input is invalid so it raised.

Raising an explicit exception is nicer, but if I'd stumble upon them, I wouldn't understand the messages... Maybe something more explicit, such as:

"invalid http chunked content: expected chunk data but got EOF"
"invalid http chunked content: expected chunk size but got EOF"

straight-shoota · 2018-04-07T21:28:46Z

@ysbaddaden you're right, I would've trouble to grasp the meaning as well... and I wrote it this morning 🙈

RX14 · 2018-04-07T21:54:07Z

@ysbaddaden before, you'd get a nil error if the EOF appeared when parsing the chunk size, but not if you got an EOF in the middle of a chunk (the EOF would just get passed through). This PR changes both to an error, so it is adding new errrors.

ysbaddaden

I just remembered we have IO::EOFError and #read_line. I think they'd be enough, and even spare the need for a message. What do you think?

ysbaddaden · 2018-04-07T23:01:04Z

src/http/content.cr

@@ -134,7 +143,7 @@ module HTTP

    private def read_chunk_start
      read_chunk_end
-      @chunk_remaining = @io.gets.not_nil!.to_i(16)
+      read_chunk_remaining


I just remembered that we have read_line that raises an IO::EOFError. Wouldn't it be enough to deal with this?

ysbaddaden · 2018-04-07T23:08:55Z

src/http/content.cr


+      if bytes_read == 0
+        raise IO::Error.new("Invalid HTTP chunked content: expected data but got EOF (missing #{@chunk_remaining} more bytes)")


We should raise an IO::EOFError (I forgot we had it). I'm not sure it needs a message: exception class says it's an unexpected EOF, backtrace says where it came from, and I don't think the expected remaining byte count is important.

I.e. best effort to report an invalid protocol/broken communication and allow to deal with it (log, and/or silently close) better than a blunt nil error.

straight-shoota · 2018-04-08T13:44:10Z

I noticed a few other issues where the implementation was incomplete and added some improvements to comply with RFC 7230.

raises IO::EOFError if a chunk cannot be completely read (I added a specific error message, I figure it doesn't hurt to be specific).
the constructor won't start reading the parent IO and thus no longer raises. Chunks will only be consumed when a read method is called.
chunk extensions (parameters after chunk size, separated by ;) are recognized but ignored.
trailing headers after the final chunk delimiter \0\r\n are parsed and can be accessed as #headers.

bew · 2018-04-08T15:56:36Z

src/http/content.cr

+    end
+
+    private def read_delimited_line
+      line = @io.read_line(16_384, chomp: false)


Where does this number 16_384 come from?

It' 2**14 = 16kiB and a fairly common default value for maximum line length in HTTP header implementations and copied from http/common.cr.

Oh ok, could you add a comment? (and/or put this in a constant?)

Constant please.

I'd suggest to do that in a small subsequent PR together with the other usages.

ysbaddaden

I don't think we have to enforce CRLF. We can rely on IO#read_line to raise the IO::EOFError and to already chomp the line. No need to repeat what's existing.

ysbaddaden · 2018-04-09T08:13:27Z

src/http/content.cr

-      # Read "\r\n"
-      @io.skip(2)
+    private def read_crlf
+      bytes = Bytes.new(2)


bytes = uninitialized UInt8[2]

Otherwise 2 bytes are allocated on the HEAP memory each time the method is called 😨

ysbaddaden · 2018-04-09T08:13:52Z

src/http/content.cr

+    end
+
+    private def read_delimited_line
+      line = @io.read_line(16_384, chomp: false)


Constant please.

ysbaddaden · 2018-04-09T08:22:18Z

src/http/content.cr


-      check_chunk_remaining_is_zero
+      if bytes_read == 0
+        raise IO::EOFError.new("Invalid HTTP chunked content: unexpected end of file")


Maybe the message is a little redundant? (EOFError ~ unexpected EOF).

ysbaddaden · 2018-04-09T08:33:56Z

src/http/content.cr

+        chunk_size = line
+      end
+
+      @chunk_remaining = chunk_size.to_i?(16) || raise IO::Error.new("Invalid HTTP chunked content: invalid chunk size (#{chunk_size.dump})")


Let's avoid allocating a string: "invalid chunk size" is enough, not need to dump the line.

Since we ignore the optional headers, and those are fairly rare, I wonder whether we really have to seek for a potential ;. Maybe that would suffice:

line.to_i(16, strict: false) { raise IO::Error.new }

Yeah, it's not really commonly used. But I'd leave it in to be RFC compliant. It's implemented anyway.

But I wondered whether string allocation can be avoided... maybe if we could read an int from an IO? This could be useful in other cases as well.
A hex encoded Int32 has 8 digits max (+ sign). So it could be stored in a UInt8[9]...

straight-shoota · 2018-04-09T09:57:45Z

@ysbaddaden The RFC explicitly requires CRLF. Some implementations (like Go's) enforce that. But Firefox, Chrome, Edge, curl all accept \n as well, so it's probably okay to be not totally strict.

Sija · 2018-04-09T10:48:45Z

src/http/content.cr

+    # All headers in the trailing headers section will be returned. Applications
+    # need to make sure to ignore them or fail if headers are not allowed
+    # in the chunked trailer part (see [RFC 7230 section 4.1.2](https://tools.ietf.org/html/rfc7230#section-4.1.2)).
+    getter headers : HTTP::Headers = HTTP::Headers.new


Could it be changed to lazy initialized version?

getter headers : HTTP::Headers { HTTP::Headers.new }

bcardiff

Just two minor comments, but it looks great from here.

bcardiff · 2018-04-09T14:43:10Z

src/http/content.cr

+
+    # Returns trailing headers read by this chunked content.
+    #
+    # Thiel value will only be populated once the entire content has been read,


is "thiel" a misspelling here? Or i need to get a new dictionary?

bcardiff · 2018-04-09T14:46:36Z

src/http/content.cr

-      @chunk_remaining = @io.gets.not_nil!.to_i(16)
-      check_last_chunk
-      @read_chunk_start = false
+    private def read_crlf


can we add a spec for allowing just \n as crlf?

RX14 · 2018-04-09T15:56:34Z

spec/std/http/chunked_content_spec.cr

@@ -9,7 +9,7 @@ describe HTTP::ChunkedContent do
    bytes_read = content.read(bytes.to_slice)
    bytes_read.should eq(4)
    String.new(bytes.to_slice).should eq("123\n")
-    mem.pos.should eq(7) # only this chunk was read
+    mem.pos.should eq(9) # only this chunk was read


This spec shouldn't change, see #3270. There's even a nice comment linking to this issue, right next to code you changed later in this PR.

bew · 2018-04-09T19:31:13Z

src/http/content.cr

@@ -61,7 +61,7 @@ module HTTP
    getter headers : HTTP::Headers { HTTP::Headers.new }

    def initialize(@io : IO)
-      @chunk_remaining = 0
+      @chunk_remaining = -1


maybe use nil instead as an "invalid value" ?

It's only an internal variable so it should be fine to use a special value to determine initial state.
If this should be changed, it would probably make more sense to add an additional ivar like @initial_read. That would be more explicit. Not that it matters much, but making it nilable would add 12 bytes, a Bool ivar only 1.

more like 8 bytes for the boolean once you align. We should do the ivar reordering/packing that rust does eventually...

straight-shoota · 2018-04-09T19:53:32Z

@RX14 The issue from #3270 was not compromised, the example provided there already worked with the code before the last commit. For the matter of that issue, it's irrelevant if CRLF at the end of the chunk is read immediately or not (i.e. 7 or 9 bytes) because it is already sent together with the chunk data anyway.

The issue in #3270 (which was fixed in 4e00257) was, when a chunk was consumed, it would read CRLF plus the entire following line - that's 12 bytes (7 + CRLF + 0 + CRLF) in the spec example. Obviously, this would delay receiving a chunk until the chunk size of the following chunk is sent.

As long as a completely sent chunk is immediately followed by CRLF, there is no issue. And I figured it should only be treated as a valid chunk if it is properly delimited by CRLF. But how to handle this is not clearly stated in the RFC. Similarly as we will properly accept CRLF in place of LF it probably makes sense to not strictly require a complete chunk to be delimited but accept it without progressing forward in the stream.

In golang/go#17355 this is explained that the read method should not have to wait once it has received some data it can deliver to the sender.

…l_chunk

RX14 · 2018-04-10T23:48:02Z

's irrelevant if CRLF at the end of the chunk is read immediately or not (i.e. 7 or 9 bytes) because it is already sent together with the chunk data anyway.

That's not an assumption I want to make. Bad implementations won't do this I'm certain. We should always return data when @chunk_remaining == 0, without waiting for the trailing crlf.

straight-shoota · 2018-04-10T23:51:41Z

Agreed. That's changed in the implementation now.

RX14 · 2018-04-10T23:59:26Z

Oh, sorry, I missed that commit.

RX14 requested changes Apr 7, 2018

View reviewed changes

straight-shoota added 2 commits April 7, 2018 10:31

Fix HTTP::ChunkedContent read incomplete chunks

7060730

fixup! Fix HTTP::ChunkedContent read incomplete chunks

907b9ce

straight-shoota force-pushed the jm/fix/5852 branch from ed475eb to 907b9ce Compare April 7, 2018 11:34

RX14 approved these changes Apr 7, 2018

View reviewed changes

Improve error messages

40dd462

RX14 approved these changes Apr 7, 2018

View reviewed changes

ysbaddaden reviewed Apr 7, 2018

View reviewed changes

straight-shoota force-pushed the jm/fix/5852 branch from 5a5fd72 to fc10b25 Compare April 8, 2018 13:29

Improve ChunkedContent implementation

e852e56

straight-shoota force-pushed the jm/fix/5852 branch from fc10b25 to e852e56 Compare April 8, 2018 13:59

straight-shoota changed the title ~~Fix HTTP::ChunkedContent read incomplete chunks~~ Improve HTTP::ChunkedContent implementation Apr 8, 2018

bew reviewed Apr 8, 2018

View reviewed changes

ysbaddaden requested changes Apr 9, 2018

View reviewed changes

straight-shoota added 3 commits April 9, 2018 09:46

Remove enforce CRLF

6710df9

Add const HTTP::MAX_HEADER_SIZE

c49d6e5

Adjust error messages

7076d4a

Sija reviewed Apr 9, 2018

View reviewed changes

Lazy initialize headers

c10c3cc

bcardiff reviewed Apr 9, 2018

View reviewed changes

Add spec for \n

1f61544

bcardiff approved these changes Apr 9, 2018

View reviewed changes

RX14 requested changes Apr 9, 2018

View reviewed changes

Fix issue with delayed chunks

7b070ed

bew reviewed Apr 9, 2018

View reviewed changes

Refactor chunk_bytes_read and @expect_chunk_start into @received_fina…

e1dd7bb

…l_chunk

RX14 approved these changes Apr 11, 2018

View reviewed changes

RX14 added this to the Next milestone Apr 11, 2018

RX14 added kind:bug kind:feature topic:stdlib labels Apr 11, 2018

RX14 merged commit c541aae into crystal-lang:master Apr 11, 2018

straight-shoota deleted the jm/fix/5852 branch April 11, 2018 06:35

straight-shoota mentioned this pull request Apr 11, 2018

Fix HTTP::ChunkedContent#peek return empty bytes at EOF #5943

Merged


		if bytes_read == 0
		raise IO::Error.new("Invalid HTTP chunked content: expected data but got EOF (missing #{@chunk_remaining} more bytes)")

Improve HTTP::ChunkedContent implementation #5928

Improve HTTP::ChunkedContent implementation #5928

Conversation

straight-shoota commented Apr 7, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RX14 Apr 7, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ysbaddaden commented Apr 7, 2018

straight-shoota commented Apr 7, 2018

RX14 commented Apr 7, 2018

ysbaddaden left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

straight-shoota commented Apr 8, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

straight-shoota Apr 9, 2018 • edited Loading

Choose a reason for hiding this comment

ysbaddaden left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ysbaddaden Apr 9, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

straight-shoota commented Apr 9, 2018

Choose a reason for hiding this comment

bcardiff left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RX14 Apr 9, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

straight-shoota commented Apr 9, 2018 • edited Loading

RX14 commented Apr 10, 2018

straight-shoota commented Apr 10, 2018

RX14 commented Apr 10, 2018

straight-shoota commented Apr 7, 2018 •

edited

Loading

RX14 Apr 7, 2018 •

edited

Loading

straight-shoota Apr 9, 2018 •

edited

Loading

ysbaddaden Apr 9, 2018 •

edited

Loading

RX14 Apr 9, 2018 •

edited

Loading

straight-shoota commented Apr 9, 2018 •

edited

Loading