-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTTP: respect gzip in serialization #4957
Conversation
0b939bd
to
e6a14ee
Compare
@@ -139,6 +139,13 @@ module HTTP | |||
end | |||
|
|||
def self.serialize_headers_and_string_body(io, headers, body) | |||
if headers.includes_word?("Content-Encoding", "gzip") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should compare with ==
instead of includes_word?
here to match what we do when decompressing (we only accept "gzip" exactly). This is because Content-Encoding
can represent multiple encodings applied in sequence. We only want to support this when the only encoding is gzip - not just when gzip is part of the encoding sequence. THis is very rare in practice, but should be in consideration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! You are right. I'll fix it. Thanks.
if headers.includes_word?("Content-Encoding", "gzip") | ||
body = IO::Memory.new.tap do |buf| | ||
gzip = Gzip::Writer.new(buf) | ||
gzip.print body |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a bit confusing with the two bodies and the tap, could you clean this up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. Finally I got two candidates about this.
body = IO::Memory.new.tap do |buf|
Gzip::Writer.open buf, &.print body
end
and
buf = IO::Memory.new
Gzip::Writer.open buf, &.print body
body = buf
I like the latter. Because although the code is a bit redundant, it's easy to read due to no nested bodies. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would format it a bit nicer though, the whitespace is very confusing here.
buf = IO::Memory.new
Gzip::Writer.open(buf, &.print(body))
body = buf
I think there's actually a bunch of cases where we use |
@@ -249,6 +249,17 @@ class HTTP::Client | |||
response.body.should eq("") | |||
end | |||
|
|||
it "gzip body if has 'gzip' in Content-Encoding header" do | |||
gzipped_body = String.build do |io| | |||
Gzip::Writer.new(io).tap { |io| io.print "Hello"; io.close } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can do:
Gzip::Writer.open io, &.print("Hello")
I'm not sure this is OK. What if I send an already compressed gzip to HTTP::Client and add a |
asterite:
Yes. I understood what you are worried about. Indeed, there can be a case where the http server already has a compressed body and set We must care it. |
@maiha raven.cr does exactly that. |
I realized that the essence of the problem is that there would be a contradiction between
So it'd be better
Here is an image for https://github.com/crystal-lang/crystal/blob/master/src/http/common.cr#L45-L48 encoding = headers["Content-Encoding"]?
case encoding
when "gzip"
body = Gzip::Reader.new(body, sync_close: true)
+ headers.delete("Content-Encoding") Thought? |
I like the idea of the headers being consistent with the way the HTTP body appears to the reader, instead of being the original HTTP headers. Although keeping the original HTTP headers around would probably be a good idea. It has the possibility of being confusing though with 2 sets of HTTP headers. It should be well-documented. |
Go seems to remove the header but set a boolean in the response indicating that this happened. Check here https://golang.org/pkg/net/http/ for Go doesn't automatically compress content, and neither should we. |
Thinking about it further, we already have the |
Yup, I agreed that we should remove headers but should not compress body in |
Can we either simply force-push this PR with the new implementation, or close this now? We tend to accumulate a lot of "resolved" issues and i'd like to be more proactive with closing. |
I see. Close this first. I'll create an another PR when it's ready. Thanks. |
This allows
HTTP::Client::Response
to provide transparent transformations betweento_io
andfrom_io
about gzip.case of
Content-Encoding: gzip
Assume that API server returned a http response with
binary(gzipped data)
body.from_io
to_io
from_io
text
Invalid gzip header
to_io
can't restore original response data.This PR
to_io
generates gzippedbinary
whenContent-Encoding: gzip
, otherwise generatestext
in the same way as before.from_io
binary
to_io
binary
from_io
binary
binary
to_io
binary
Thanks.