Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP: respect gzip in serialization #4957

Closed
wants to merge 1 commit into from

Conversation

maiha
Copy link
Contributor

@maiha maiha commented Sep 12, 2017

This allows HTTP::Client::Response to provide transparent transformations between to_io and from_io about gzip.

case of Content-Encoding: gzip

Assume that API server returned a http response with binary(gzipped data) body.

command source (body) operation generated (body)
1st from_io binary decodes binary text
1st to_io text just combines header and body text
2nd from_io text decodes text Invalid gzip header

to_io can't restore original response data.

This PR

to_io generates gzipped binary when Content-Encoding: gzip, otherwise generates text in the same way as before.

command source (body) operation generated (body)
1st from_io binary decodes binary text
1st to_io text combines header and gzipped body binary
2nd from_io binary decodes binary text
2nd to_io text combines header and gzipped body binary
3rd ... ... ... ...

Thanks.

@@ -139,6 +139,13 @@ module HTTP
end

def self.serialize_headers_and_string_body(io, headers, body)
if headers.includes_word?("Content-Encoding", "gzip")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should compare with == instead of includes_word? here to match what we do when decompressing (we only accept "gzip" exactly). This is because Content-Encoding can represent multiple encodings applied in sequence. We only want to support this when the only encoding is gzip - not just when gzip is part of the encoding sequence. THis is very rare in practice, but should be in consideration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! You are right. I'll fix it. Thanks.

if headers.includes_word?("Content-Encoding", "gzip")
body = IO::Memory.new.tap do |buf|
gzip = Gzip::Writer.new(buf)
gzip.print body
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a bit confusing with the two bodies and the tap, could you clean this up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. Finally I got two candidates about this.

      body = IO::Memory.new.tap do |buf|
        Gzip::Writer.open buf, &.print body
      end

and

      buf = IO::Memory.new
      Gzip::Writer.open buf, &.print body
      body = buf

I like the latter. Because although the code is a bit redundant, it's easy to read due to no nested bodies. Right?

Copy link
Contributor

@RX14 RX14 Sep 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would format it a bit nicer though, the whitespace is very confusing here.

      buf = IO::Memory.new
      Gzip::Writer.open(buf, &.print(body))
      body = buf

@RX14
Copy link
Contributor

RX14 commented Sep 12, 2017

I think there's actually a bunch of cases where we use body_io where this rule will have no effect too. Please check serialize_headers_and_body to make sure this works when we dont call serialize_headers_and_string_body.

@@ -249,6 +249,17 @@ class HTTP::Client
response.body.should eq("")
end

it "gzip body if has 'gzip' in Content-Encoding header" do
gzipped_body = String.build do |io|
Gzip::Writer.new(io).tap { |io| io.print "Hello"; io.close }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do:

Gzip::Writer.open io, &.print("Hello")

@asterite
Copy link
Member

I'm not sure this is OK. What if I send an already compressed gzip to HTTP::Client and add a Content-Encoding: gzip header? The client will re-compress it, right?

@maiha
Copy link
Contributor Author

maiha commented Sep 12, 2017

asterite:

The client will re-compress it, right?

Yes.

I understood what you are worried about. Indeed, there can be a case where the http server already has a compressed body and set Content-Encoding to gzip. And in that case we will be in trouble.

We must care it.

@Sija
Copy link
Contributor

Sija commented Sep 12, 2017

@maiha raven.cr does exactly that.

@maiha
Copy link
Contributor Author

maiha commented Sep 12, 2017

I realized that the essence of the problem is that there would be a contradiction between headers and body on HTTP::Client::Response. And it comes from from_io.

  • (in server context) httpd puts response as (gzip, gzipped body): that's OK
  • (in client context) from_io deflates response as (gzip, plain): inconsistency!

So it'd be better from_io remove Content-Encoding: gzip from headers after deflates.

  • (when gzip) from_io defaltes response as (none, plain): that's OK

Here is an image for https://github.com/crystal-lang/crystal/blob/master/src/http/common.cr#L45-L48

            encoding = headers["Content-Encoding"]?
            case encoding
            when "gzip"
              body = Gzip::Reader.new(body, sync_close: true)
+             headers.delete("Content-Encoding")

Thought?

@RX14
Copy link
Contributor

RX14 commented Sep 12, 2017

I like the idea of the headers being consistent with the way the HTTP body appears to the reader, instead of being the original HTTP headers. Although keeping the original HTTP headers around would probably be a good idea. It has the possibility of being confusing though with 2 sets of HTTP headers. It should be well-documented.

@asterite
Copy link
Member

Go seems to remove the header but set a boolean in the response indicating that this happened. Check here https://golang.org/pkg/net/http/ for Uncompressed. We should probably do the same.

Go doesn't automatically compress content, and neither should we.

@RX14
Copy link
Contributor

RX14 commented Sep 13, 2017

Thinking about it further, we already have the CompressHandler to handle compressing responses. I agree we shouldn't compress here.

@maiha
Copy link
Contributor Author

maiha commented Sep 14, 2017

Yup, I agreed that we should remove headers but should not compress body in to_io.
I'll create a new PR about it. And this PR will be closed soon. Thanks.

@RX14
Copy link
Contributor

RX14 commented Sep 14, 2017

Can we either simply force-push this PR with the new implementation, or close this now? We tend to accumulate a lot of "resolved" issues and i'd like to be more proactive with closing.

@maiha
Copy link
Contributor Author

maiha commented Sep 14, 2017

I see. Close this first. I'll create an another PR when it's ready. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants