Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: HTTP::FormData parsing of multipart subparts #5683

Closed
wants to merge 2 commits into from
Closed

Fix: HTTP::FormData parsing of multipart subparts #5683

wants to merge 2 commits into from

Conversation

stakach
Copy link
Contributor

@stakach stakach commented Feb 7, 2018

For example post data see https://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.2
In a multi-file upload the files are uploaded as a subpart with Content-Disposition: file; filename="file1.txt" this patch allows parsing of subparts using the standard library.

References: https://tools.ietf.org/html/rfc2388

   either of the "content-disposition: form-data"
   header or, in the case of multiple files, in a "content-disposition:
   file" header of the subpart.

For example post data see https://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.2
In a multi-file upload the files are uploaded as a subpart with `Content-Disposition: file; filename="file1.txt"` this patch allows parsing of subparts using the standard library.

References: https://tools.ietf.org/html/rfc2388

```
   either of the "content-disposition: form-data"
   header or, in the case of multiple files, in a "content-disposition:
   file" header of the subpart.
```
@@ -133,7 +133,7 @@ module HTTP::FormData

parts = content_disposition.split(';')
type = parts[0]
raise Error.new("Invalid Content-Disposition: not form-data") unless type == "form-data"
raise Error.new("Invalid Content-Disposition: not form-data or file") unless ["form-data", "file"].includes?(type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you incorrectly removed 2 spaces at the beginning.

Also, you should use a Tuple instead of an array to avoid having to allocate the array, so something like: {"form-data", "file"}.includes? type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, will update

@bew
Copy link
Contributor

bew commented Feb 7, 2018

I don't have enough knowledge on how HTTP is implemented and how this might be used to validate the relevance & implementation of your PR, but language-wise it looks better yes 😃

@jhass
Copy link
Member

jhass commented Feb 7, 2018

Actually, an added spec or two would be nice!

@jhass jhass requested a review from RX14 February 7, 2018 08:42
@straight-shoota
Copy link
Member

The error message actually already states that it should work with bot form-data and file yet the implementation only accepts the former.

The change is correct, but it should be supplemented by a spec.

@RX14
Copy link
Contributor

RX14 commented Feb 7, 2018

This is incorrect as per the RFC. Look at the example:

   Content-Type: multipart/form-data; boundary=AaB03x

   --AaB03x
   Content-Disposition: form-data; name="submit-name"

   Larry
   --AaB03x
   Content-Disposition: form-data; name="files"
   Content-Type: multipart/mixed; boundary=BbC04y

   --BbC04y
   Content-Disposition: file; filename="file1.txt"
   Content-Type: text/plain

   ... contents of file1.txt ...
   --BbC04y
   Content-Disposition: file; filename="file2.gif"
   Content-Type: image/gif
   Content-Transfer-Encoding: binary

   ...contents of file2.gif...
   --BbC04y--
   --AaB03x--

This is a multipart/mixed with each part containing a Content-Disposition: file header contained inside a multipart/form-data.

We should discuss how we want to handle this, but this is absolutely the incorrect fix and will not parse the above data.

@RX14
Copy link
Contributor

RX14 commented Feb 7, 2018

To make it clear what I mean, let me indent:

Content-Type: multipart/form-data; boundary=AaB03x

--AaB03x
Content-Disposition: form-data; name="submit-name"

Larry
--AaB03x
Content-Disposition: form-data; name="files"
Content-Type: multipart/mixed; boundary=BbC04y

   --BbC04y
   Content-Disposition: file; filename="file1.txt"
   Content-Type: text/plain

   ... contents of file1.txt ...
   --BbC04y
   Content-Disposition: file; filename="file2.gif"
   Content-Type: image/gif
   Content-Transfer-Encoding: binary

   ...contents of file2.gif...
   --BbC04y--
--AaB03x--

Copy link
Contributor

@RX14 RX14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A spec would have shown this was the incorrect fix.

@straight-shoota
Copy link
Member

@RX14 Good catch. From the initial description of the method and the error message it seemed obvious to be the intended behaviour. But when you look at the context of how it's used, it is clear that it won't work this way.

There needs to be extra effort to support multiple file payloads.

@RX14
Copy link
Contributor

RX14 commented Feb 7, 2018

I think most actual browsers don't do this, they just send multiple fields with the same name, instead of nesting multipart/*

@stakach
Copy link
Contributor Author

stakach commented Feb 7, 2018

@RX14 to your point, I am extracting the indented data and then wanting to parse it. Please see: https://github.com/spider-gazelle/action-controller/blob/master/src/action-controller/body_parser.cr#L61

So this minor modification solves the issue. Otherwise I have to have an independent copy of this parsing code for something covered by the specification.

@stakach
Copy link
Contributor Author

stakach commented Feb 7, 2018

i.e. even if in the future if effort is taken to support multiple file payloads you can use this parsing code to extract those files.

Supporting the two types of Content-Dispositions makes sense at this level in the process.

@RX14
Copy link
Contributor

RX14 commented Feb 7, 2018

This class is for parsing multipart/form-data though, you're trying to parse multipart/mixed so you should use HTTP::Multipart. So this error is correct: you're trying to parse the incorrect format.

We should have some way to parse a Content-Disposition: file header, I entirely agree. However I think the way we want to do that should be discusses in a new issue, and this PR closed.

@RX14
Copy link
Contributor

RX14 commented Feb 7, 2018

The HTTP::FormData parser is actually a tiny wrapper around HTTP::Multipart so it should be easy to switch.

@stakach
Copy link
Contributor Author

stakach commented Feb 7, 2018

Thanks for the tip @RX14 - I'll re-factor using HTTP::Multipart

@stakach stakach closed this Feb 7, 2018
@RX14
Copy link
Contributor

RX14 commented Feb 7, 2018

@stakach You'll still have to parse the Content-Disposition header yourself. This is absolutely non-ideal. Please would you submit a feature-request for a way to parse Content-Dispostion: file headers? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants