Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP Multipart/FormData support #3844

Merged
merged 5 commits into from Feb 14, 2017
Merged

Conversation

RX14
Copy link
Contributor

@RX14 RX14 commented Jan 5, 2017

Lets try again!

@@ -223,6 +223,24 @@ module HTTP
nil
end

# Dequotes a RFC2616 quoted-string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe show example code?


# Finite State Machine diagram: https://gist.github.com/RX14/221c1edfa98d1196711515d4b5c264eb

# Generates a content type header with multipart subtype *subtype*, and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "Returns"? I initially thought this was appended to the generated content (because it says "generates" and this is a Generator class)

end

# Yields an IO that can be used to write to a body part which is appended
# to the multipart message with the given *headers*. Throws is `#finish` or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Throws if"

require "spec"
require "http"

def mp_parse(delim, data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private def

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was written before those existed!

begin
yield headers, body_io
ensure
# Read to end of body
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here you can use IO#skip_to_end method that I added some days ago

(I'm happy to see more cases where this method is useful ^_^)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think i'll send a PR which increases the buffer size of skip_to_end, gets_to_end so they trigger IO::Buffered to skip the buffer (>=4096).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I can commit that directly. What buffer size would be good?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this as a commit to this PR, you can cherry-pick it to master if you want.

# form_data = "--aA40\r\nContent-Disposition: form-data; name=\"field1\"\r\n\r\nfield data\r\n--aA40--"
# HTTP::FormData.parse(IO::Memory.new(form_data), "aA40") do |field, io|
# field # => "field1"
# data # => "field data"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

io.gets_to_end # => "field data"

#
# HTTP::FormData.parse(request) do |field, io|
# field # => "field1"
# data # => "field data"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

io.gets_to_end # => "field data"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also missing how to access the meta and headers variables and what they are. Same above.

@asterite
Copy link
Member

asterite commented Jan 6, 2017

@RX14 Overall looks really good, I really like it! I don't know why I rejected it some time ago... sorry about that 😊

I made a few comments for fixes. After that I'll merge it, so we can have it for 0.20.4. Thank you for this! 🎉

Copy link
Contributor

@ysbaddaden ysbaddaden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to look good overall. It's quite a large pull request, so I didn't delve too deep, yet noted a few things.

I would rename "generate" to "build", which would fit better in the stdlib, overall (e.g. String.build, XML.build, ...).

The public API looks good. We can quickly iterate the fields, values and have direct IO access. Though making the IO access required is maybe too much. Yielding many variables, too. Maybe a single Data struct, with #headers, #name, #value (aka io.gets_to_end or something more efficient), #io, and moving methods from Metadata directly to this struct would be nicer to work with.

Last but no least, everything is documented. Yet, I'm missing some high level examples for everyday use cases:

  • how do I build/send a multipart body with HTTP::Client (string, file, ...);
  • how do I parse the multipart body of a HTTP::Request (string, file, ...).

end

# Parses the multipart boundary from the value of the Content-Type header,
# or nil if the boundary was not found.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about?

Extracts the multipart boundary from the Content-Type header.
May return nil if the boundary was not found.

#
# Please note that the IO object yielded by `#next` is only valid until the
# block returns.
class Parser
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about a Parser#each iterator?

# Yields each part until the whole multipart message is consumed. See `#next` for details.
def each
  while has_next?
    self.next do |headers, io, meta, headers|
      yield headers, io, meta, headers
    end
  end
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That usecase is meant to be HTTP::Multipart.parse.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's missing headers and meta?

end

# Finalizes the multipart message, this method must be called before the
# generated multipart message written to the IO is considered valid.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe?

this method must be called to properly end the multipart message.

#
# HTTP::FormData.parse(request) do |field, io|
# field # => "field1"
# data # => "field data"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also missing how to access the meta and headers variables and what they are. Same above.

# io.to_s # => "HTTP/1.1 200 OK\r\nContent-Type: multipart/form-data; boundary=\"boundary\"\r\n ...
# ```
#
# See: `FormData::Generator`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. An example explaining how to create/send a multipart request with HTTP::Client would be more interesting (common) than a multipart response (quite uncommon).
  2. The example could use generator.boundary (automatically generated) which is a better practice than the fixed "boundary" string. The example may then use whatever boundary that would have been generated.

def self.parse(io, boundary)
parser = Parser.new(io, boundary)
while parser.has_next?
parser.next { |headers, io| yield headers, io }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it missing meta and headers? If not, how are we supposed to access them? Same below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is plain multipart, not multipart/form-data. Field names and file metadata are specific to the multipart/form-data mime-type.


# Yields a `Multipart::Generator` to the given block, returning the generated
# message as a `String`.
def self.generate(boundary = "--------------------------#{SecureRandom.urlsafe_base64(18)}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about extracting the default boundary generation to a helper method, to avoid repeating it over and over?

@asterite
Copy link
Member

asterite commented Jan 6, 2017

I agree with all of @ysbaddaden comments. Let's postpone this PR for after 0.20.4 because I guess some changes are needed yet and I wanted to prepare a release today.

@RX14
Copy link
Contributor Author

RX14 commented Jan 6, 2017

I've found some more HTTP quoted-string craziness (ref, ref), so i'll take the time to sort that out properly.

@RX14 RX14 force-pushed the feature/multipart2 branch 2 times, most recently from dde4072 to 6a3abb1 Compare January 7, 2017 00:32
@RX14
Copy link
Contributor Author

RX14 commented Jan 7, 2017

Updated the PR with my work today, github is fairly accurate on which points I have addressed and which I haven't. I'll split commits from the PR out into separate PRs on request.

@RX14
Copy link
Contributor Author

RX14 commented Jan 7, 2017

There seems to be some confusion between multipart/* and multipart/form-data. RFC1341 defines that every media type using the multipart type must have a specific format. This format consists of a sequence of body-parts, each of which consists of any number of headers (including 0) and a body. It does not define any required headers.

The multipart/form-data subtype is an extension of the multipart media type which defines that each body part must contain a Content-Disposition header, which contains the name of the form field and optional metadata which is used when the field value is a file.

For that reason, HTTP::Multipart only yields headers and an IO, whereas HTTP::FormData yields extra data parsed from the Content-Disposition header. I'm probably going to create a HTTP::FormData::Part struct which contains the data from each part.

@ysbaddaden
Copy link
Contributor

Yes, I got confused. Thanks for the explanation.

Could you detail this in the docs for the different classes? Mostly insist on what they each apply to, and maybe note in HTTP::Multipart that we may be looking for HTTP::FormData instead?

In that light, I don't think the module helpers in HTTP::Multipart are needed. Having a few helpers in HTTP::FormData shall be enough. Other formats will use the HTTP::Multipart Parser and Builder classes directly.


struct Part
getter name : String
getter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like there's some code missing in here, I guess it's WIP :)

@RX14
Copy link
Contributor Author

RX14 commented Feb 2, 2017

Sorry for the super-extended delay replying to these issue comments. I've done quite a bit of work on the PR, and it's ready to be reviewed again.

@@ -7,6 +7,11 @@ describe HTTP do
HTTP.parse_time("Sun, 06 Nov 1994 08:49:37 GMT").should eq(time)
end

it "parses RFC1123 without day name" do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space between RFC and its number would do nice ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@RX14
Copy link
Contributor Author

RX14 commented Feb 6, 2017

I've rebased onto master so the CI should be fixed on OSX now. @ysbaddaden, another review would be appreciated :)

@sdogruyol
Copy link
Member

It'd be great to have this at next Crystal release please 🙏 👍

@ysbaddaden
Copy link
Contributor

@RX14 sorry, I've been quite busy lately, and haven't found the time to review this, yet.

# ```
def self.quote_string(string, io)
string.each_byte do |byte|
case QUOTE_LOOKUP_TABLE[byte]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We think it's better here to just do case on the byte, without a lookup table. The code will be clearer and it won't be unsafe (because it uses Pointer).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. Do we know if LLVM will transform case statements to lookup tables?

Copy link
Contributor Author

@RX14 RX14 Feb 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I going crazy or is this a compiler bug: https://gist.github.com/RX14/e2dd192c7cb0e7cfb2278c2c46f3bea3 ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The when takes the value of the comparison, so its comparing against a bool

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it's like you are doing if byte == (byte <= 0x1F)

Copy link
Contributor Author

@RX14 RX14 Feb 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asterite I would have sworn that i've seen case used like that before. I guess I shouldn't code at 1am.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can definitely do that if you omit the case expression. Then it becomes like if elsif but with all the conditions aligned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asterite committed the new version using case, and it's even 4% faster!

# builder.finish
# io.to_s # => "--aA47\r\nContent-Disposition: form-data; name=\"name\"\r\n\r\njoe\r\n--aA47\r\nContent-Disposition: form-data; name=\"upload\"; filename=\"test.txt\"\r\n\r\nfile contents\r\n--aA47--"
# ```
class Builder
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you rename the file (and the specs) to be named "builder" instead of "generator", now that the types are also named "Builder"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry I thought I'd renamed them all! Thanks for the catch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asterite Renamed them

@RX14
Copy link
Contributor Author

RX14 commented Feb 14, 2017

@ysbaddaden next time please can you leave a PR review instead of a commit comment, the commit comments get a bit lost when I force-push.

@RX14
Copy link
Contributor Author

RX14 commented Feb 14, 2017

Interesting: this build failed due to lack of memory https://travis-ci.org/crystal-lang/crystal/jobs/201479637

@asterite asterite added this to the 0.21.0 milestone Feb 14, 2017
@asterite asterite merged commit f0770cd into crystal-lang:master Feb 14, 2017
@RX14
Copy link
Contributor Author

RX14 commented Feb 14, 2017

Finally! 🎉 🎉 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants