[WIP] Support http server request streaming #3395

asterite · 2016-10-08T19:57:39Z

(work in progress: do not merge)

Right now HTTP::Server has a problem: it consumes the whole request body into a string, every time. So if you post a big body into a server then the memory consumption will go pretty high.

To solve this, this PR splits HTTP::Request into two classes: HTTP::Client::Request and HTTP::Server::Request. The HTTP::Server::Request exposes two properties:

body_io : IO? an IO to the request's body. Will be nil if the request has no body (for example in a GET request)
body : String: if you call this, it will consume the body_io and give you back a String (and memoize it)

The server will automatically read the request for you (without storing it into a string) if you write to the response. I tried posting big strings into the server and memory consumption would go up to a few gigas before, and now it remains stable at 1 MB.

This is a WIP because:

Maybe we could remove the body property, because invoking it can potentially bring back the old problem. Also invoking request.body_io.try &.gets_to_end isn't very complex either, and it makes you think "Hmm... this could potentially load the whole thing into memory". But in most cases you could feed the IO directly to a parse, for example to parse JSON or multipart data.
I copied HTTP::Request into the other two types so a lot of methods are currently duplicated, but maybe it doesn't make sense to retain the same methods in both types

Once we merge this, one last thing remains: supporting streaming http client requests. That is, if you want to do a big POST and instead of building a big string you want to write directly into the socket, you can't do that right now. I imagine providing a callback to HTTP::Client::Request where you write to the IO would be OK. But this is the subject of another PR :-)

This change is of course backwards incompatible, but necessary.

RX14 · 2016-10-08T20:05:08Z

I think that we should get rid of body_io on both client response and server request. The body of a request should be an IO to force developers to think of the correct way to handle requests efficienctly. Legitimising reading http requests in as strings may create an environment where crystal webapps treat http bodies as strings first and IOs later, which is both inefficient and dangerous to memory.

asterite · 2016-10-08T20:13:11Z

@RX14 I think for an HTTP client this is less dangerous. For example if I want to consume GitHub's API to fetch a repo's info, the string will be relatively small so that won't be a problem. In those cases getting the body as a string is pretty convenient. And parsing JSON from a small string is much faster than from an IO, mostly because you can create string subslices from the original string, but with an IO you can't (at least easily) do that.

For a server it is a bit more dangerous because someone can send a big body and you can't control that. So in that case I agree that only body as an IO should be provided.

RX14 · 2016-10-08T20:31:10Z

Unfortunately then you have an inconsistency. I would argue that the body object of every {server,client} {request,response} should be of the same type so that all types follow the same basic interface. Having wildly different interfaces for request and response depending on if it's client or server violates the principle of least surprise. Maybe both Response types should include a HTTP::Request module which provides basic common methods.

I would also argue that the body should be read-once, and request and response types be immutable structs. This has performance advantages but also makes the implementations much simpler without having to synchronise many different fields with mutable caches.

sdogruyol · 2016-10-09T11:22:54Z

@asterite i can confirm that the memory stays the same. Even for really big files (1GB) for multipart parsing.

However the parsing got 4x slower with this patch.

Before: 6324.3 ms
After: 26427.92ms

WDYT?

asterite · 2016-10-09T11:31:01Z

@sdogruyol Cool! Parsing of what? :-)

sdogruyol · 2016-10-09T11:32:32Z

@asterite parsing of a 1 GB file (ubuntu-14.04-desktop.img) upload with multipart 😄

asterite · 2016-10-09T13:35:54Z

@sdogruyol I think it's expected that it's going to be slower, because parsing from a string or MemoryIO is faster than from an IO over a socket. However, it shouldn't be that slower. I profiled the code and found a few places where I could optimize things. Could you try again? (warning: I forced pushed). I don't think it will be equally fast as before, but maybe 2x slower or similar...

RX14 · 2016-10-09T13:39:47Z

Using a larger buffer size in IO.copy might help. 8KB is probably a good size to fit in L1 cache.

sdogruyol · 2016-10-09T13:43:33Z

@asterite

Before: 26427.92ms
Now: 12437.06ms

More than 2X faster 🎉 But still 2 times slower than the previous one 😿

RX14 · 2016-10-09T13:45:50Z

@sdogruyol Try using:

buffer = uninitialized UInt8[8192]
while (len = src.read(buffer.to_slice).to_i32) > 0
  dst.write buffer.to_slice[0, len]
end

instead of IO.copy, where src is the multipart IO and dst is the file.

sdogruyol · 2016-10-09T14:39:01Z

BTW i do think that this is pretty good for 1.0. After that we can think of a way to speed it up more.

asterite · 2016-10-20T11:03:41Z

Closed in favor of #3406

[WIP] Support http server request streaming

Verified

This commit was signed with the committer’s verified signature.

headius Charles Oliver Nutter

GPG key ID: 8E26FF248BC7AEAD

Verified
Learn about vigilant mode

aa12884

asterite force-pushed the feature/http_server_request_stream branch from 59c35f3 to aa12884 Compare October 9, 2016 13:34

asterite mentioned this pull request Oct 9, 2016

Support HTTP server and client request streaming (single HTTP::Request) #3406

Merged

RX14 mentioned this pull request Oct 10, 2016

Downloading large file (700 Mo) RX14/multipart.cr#3

Closed

asterite closed this Oct 20, 2016

asterite deleted the feature/http_server_request_stream branch October 20, 2016 11:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Support http server request streaming #3395

[WIP] Support http server request streaming #3395

asterite commented Oct 8, 2016

RX14 commented Oct 8, 2016

asterite commented Oct 8, 2016

RX14 commented Oct 8, 2016

sdogruyol commented Oct 9, 2016

asterite commented Oct 9, 2016

sdogruyol commented Oct 9, 2016 •

edited

Loading

asterite commented Oct 9, 2016

RX14 commented Oct 9, 2016 •

edited

Loading

sdogruyol commented Oct 9, 2016 •

edited

Loading

RX14 commented Oct 9, 2016 •

edited

Loading

sdogruyol commented Oct 9, 2016

asterite commented Oct 20, 2016

[WIP] Support http server request streaming #3395

[WIP] Support http server request streaming #3395

Conversation

asterite commented Oct 8, 2016

RX14 commented Oct 8, 2016

asterite commented Oct 8, 2016

RX14 commented Oct 8, 2016

sdogruyol commented Oct 9, 2016

asterite commented Oct 9, 2016

sdogruyol commented Oct 9, 2016 • edited Loading

asterite commented Oct 9, 2016

RX14 commented Oct 9, 2016 • edited Loading

sdogruyol commented Oct 9, 2016 • edited Loading

RX14 commented Oct 9, 2016 • edited Loading

sdogruyol commented Oct 9, 2016

asterite commented Oct 20, 2016

sdogruyol commented Oct 9, 2016 •

edited

Loading

RX14 commented Oct 9, 2016 •

edited

Loading

sdogruyol commented Oct 9, 2016 •

edited

Loading

RX14 commented Oct 9, 2016 •

edited

Loading