Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Support http server request streaming #3395

Closed
wants to merge 1 commit into from

Conversation

asterite
Copy link
Member

@asterite asterite commented Oct 8, 2016

(work in progress: do not merge)

Right now HTTP::Server has a problem: it consumes the whole request body into a string, every time. So if you post a big body into a server then the memory consumption will go pretty high.

To solve this, this PR splits HTTP::Request into two classes: HTTP::Client::Request and HTTP::Server::Request. The HTTP::Server::Request exposes two properties:

  • body_io : IO? an IO to the request's body. Will be nil if the request has no body (for example in a GET request)
  • body : String: if you call this, it will consume the body_io and give you back a String (and memoize it)

The server will automatically read the request for you (without storing it into a string) if you write to the response. I tried posting big strings into the server and memory consumption would go up to a few gigas before, and now it remains stable at 1 MB.

This is a WIP because:

  • Maybe we could remove the body property, because invoking it can potentially bring back the old problem. Also invoking request.body_io.try &.gets_to_end isn't very complex either, and it makes you think "Hmm... this could potentially load the whole thing into memory". But in most cases you could feed the IO directly to a parse, for example to parse JSON or multipart data.
  • I copied HTTP::Request into the other two types so a lot of methods are currently duplicated, but maybe it doesn't make sense to retain the same methods in both types

Once we merge this, one last thing remains: supporting streaming http client requests. That is, if you want to do a big POST and instead of building a big string you want to write directly into the socket, you can't do that right now. I imagine providing a callback to HTTP::Client::Request where you write to the IO would be OK. But this is the subject of another PR :-)

This change is of course backwards incompatible, but necessary.

@RX14
Copy link
Member

RX14 commented Oct 8, 2016

I think that we should get rid of body_io on both client response and server request. The body of a request should be an IO to force developers to think of the correct way to handle requests efficienctly. Legitimising reading http requests in as strings may create an environment where crystal webapps treat http bodies as strings first and IOs later, which is both inefficient and dangerous to memory.

@asterite
Copy link
Member Author

asterite commented Oct 8, 2016

@RX14 I think for an HTTP client this is less dangerous. For example if I want to consume GitHub's API to fetch a repo's info, the string will be relatively small so that won't be a problem. In those cases getting the body as a string is pretty convenient. And parsing JSON from a small string is much faster than from an IO, mostly because you can create string subslices from the original string, but with an IO you can't (at least easily) do that.

For a server it is a bit more dangerous because someone can send a big body and you can't control that. So in that case I agree that only body as an IO should be provided.

@RX14
Copy link
Member

RX14 commented Oct 8, 2016

Unfortunately then you have an inconsistency. I would argue that the body object of every {server,client} {request,response} should be of the same type so that all types follow the same basic interface. Having wildly different interfaces for request and response depending on if it's client or server violates the principle of least surprise. Maybe both Response types should include a HTTP::Request module which provides basic common methods.

I would also argue that the body should be read-once, and request and response types be immutable structs. This has performance advantages but also makes the implementations much simpler without having to synchronise many different fields with mutable caches.

@sdogruyol
Copy link
Member

@asterite i can confirm that the memory stays the same. Even for really big files (1GB) for multipart parsing.

However the parsing got 4x slower with this patch.

Before: 6324.3 ms
After: 26427.92ms

WDYT?

@asterite
Copy link
Member Author

asterite commented Oct 9, 2016

@sdogruyol Cool! Parsing of what? :-)

@sdogruyol
Copy link
Member

sdogruyol commented Oct 9, 2016

@asterite parsing of a 1 GB file (ubuntu-14.04-desktop.img) upload with multipart 😄

Verified

This commit was signed with the committer’s verified signature.
headius Charles Oliver Nutter
@asterite asterite force-pushed the feature/http_server_request_stream branch from 59c35f3 to aa12884 Compare October 9, 2016 13:34
@asterite
Copy link
Member Author

asterite commented Oct 9, 2016

@sdogruyol I think it's expected that it's going to be slower, because parsing from a string or MemoryIO is faster than from an IO over a socket. However, it shouldn't be that slower. I profiled the code and found a few places where I could optimize things. Could you try again? (warning: I forced pushed). I don't think it will be equally fast as before, but maybe 2x slower or similar...

@RX14
Copy link
Member

RX14 commented Oct 9, 2016

Using a larger buffer size in IO.copy might help. 8KB is probably a good size to fit in L1 cache.

@sdogruyol
Copy link
Member

sdogruyol commented Oct 9, 2016

@asterite

Before: 26427.92ms
Now: 12437.06ms

More than 2X faster 🎉 But still 2 times slower than the previous one 😿

@RX14
Copy link
Member

RX14 commented Oct 9, 2016

@sdogruyol Try using:

buffer = uninitialized UInt8[8192]
while (len = src.read(buffer.to_slice).to_i32) > 0
  dst.write buffer.to_slice[0, len]
end

instead of IO.copy, where src is the multipart IO and dst is the file.

@sdogruyol
Copy link
Member

BTW i do think that this is pretty good for 1.0. After that we can think of a way to speed it up more.

@asterite
Copy link
Member Author

Closed in favor of #3406

@asterite asterite closed this Oct 20, 2016
@asterite asterite deleted the feature/http_server_request_stream branch October 20, 2016 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants