Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Etag support in HTTP::StaticFileHandler #6145

Merged
merged 1 commit into from Jun 15, 2018

Conversation

rwojsznis
Copy link
Contributor

Hi all

I was trying to extract some logic in Kemal - kemalcr/kemal#440 - it seems at this point static file handler is kinda hard to re-use as it's quite coupled - a lot of stuff is going on there at the moment.

Maybe injecting conditional into handle arguments might be a good idea? So you can basically plug-in anything you want there (you could also remove caching by injecting some class that will simply return false for matches? class method).

It's just a totally random proposal / proof of concept (wip) - please let me know what you think about it so we can proceed with it / change the approach or simply reject it 💣

Keep up great work, I'm really enjoying having 'ruby experience' in a compiled language - it's truly awesome ❤️

@rwojsznis rwojsznis changed the title [WIP] Redo HTTP::StaticFileHandler [WIP/Proposal] Redo HTTP::StaticFileHandler May 31, 2018
@straight-shoota
Copy link
Member

It's certainly a good idea to clean up and reorder some of the logic in that class. But I don't see the point in having a Conditional type. It just complicates things. This static file handler doesn't need to be super flexible. It's just a default implementation. It doesn't need to support complex scenarios (although solutions to that can be based upon).

It's particularly odd that this PR adds support for Etag/If-None-Match but is seemingly not to be able to use it alongside Last-Modified/If-Modified-Since simultaneously. That's not really helpful.

In my opinion, Etag support should be added without sacrificing any of the current behaviour. It could make sense to extract some features to private instance methods (for example matches? and add_headers) that isolate behaviour and can be used for customization in inheriting classes (if needed).

@rwojsznis
Copy link
Contributor Author

Thanks for quick feedback! We can pack logic internally and still keep it extendable - I thought why not making it more configurable out of the box.

As for using both simultaneously - I will give it another go; by (quickly :x) looking at rfc7232 before submitting that proposal I thought it made no sense because of:

A recipient MUST ignore If-Modified-Since if the request contains an
If-None-Match header field; the condition in If-None-Match is
considered to be a more accurate replacement for the condition in
If-Modified-Since, and the two are only combined for the sake of
interoperating with older intermediaries that might not implement
If-None-Match.

But I guess we still have to deal with some older clients and support both at the same time on server side 🤔 good point!

Will ping you hopefully this week for further discussion and I'm still hoping for more feedback ;).

@straight-shoota
Copy link
Member

Yes, some clients or intermediaries might only use one of them so we should support both. And If-None-Match takes precedence. Please also be aware that its value can be a number of etags and you need to validate each of them.

@rwojsznis
Copy link
Contributor Author

Another quick go, please lemme know wdyt - I'm still collecting feedback here (will rebase whole thing anyway)

Note/Fun (?) fact: Implementation in Rails contradicts specification linked here, but I guess this should follow rfc 🙊

@@ -6,6 +6,8 @@ require "uri"
class HTTP::StaticFileHandler
include HTTP::Handler

ALLOWED_METHODS = ["GET", "HEAD"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ALLOWED_METHODS = %w(GET HEAD)

@@ -24,12 +26,13 @@ class HTTP::StaticFileHandler
end

def call(context)
unless context.request.method == "GET" || context.request.method == "HEAD"
request_method = context.request.method
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useless assign.

Copy link
Member

@straight-shoota straight-shoota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good overall 👍 I've added a few line comments about simple improvements.

A few notes regarding general design:

  • I don't see a benefit from isolating the behaviour in an additional Cache type. These methods should just be (private) instance methods in StaticFileHandler. They're of no use outside of that class.
  • Combining methods etag? and modified_since? in match? would avoid lot's of boiler plate and would even make it better understandable what's going on. IMHO it doesn't make sense to separate these methods.

Support for a list of etags in If-None-Match header and validating each of them would be great, but that can be added when the general design is settled (should be fairly easy).

@@ -6,7 +6,10 @@ require "uri"
class HTTP::StaticFileHandler
include HTTP::Handler

ALLOWED_METHODS = ["GET", "HEAD"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we need to expose this as a constant. But it should be a tuple if so.

if @fallthrough
call_next(context)
else
context.response.status_code = 405
context.response.headers.add("Allow", "GET, HEAD")
context.response.headers.add("Allow", ALLOWED_METHODS.join(", "))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generating that string dynamically is useless as the allowed methods are not gonna change. Better leave it as before and scrap ALLOWED_METHODS.

response.status_code.should eq(200)
response.headers["Last-Modified"].should eq(HTTP.rfc1123_date(File.info("#{__DIR__}/static/test.txt").modification_time))
response.body.should eq(File.read("#{__DIR__}/static/test.txt"))
end
end

context "with If-None-Match header" do
it "should return 304 Not Modified if header matches etag" do
etag = %{W/"#{File.info("#{__DIR__}/static/test.txt").modification_time.epoch.to_s}"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't generate a etag manually. How it's generated is irrelevant to this spec.

Just send a prior request to get the etag.

context "with both If-None-Match and If-Modified-Since headers" do
it "ignores If-Modified-Since as specified in RFC 7232" do
headers = HTTP::Headers.new
etag = %{W/"#{File.info("#{__DIR__}/static/test.txt").modification_time.epoch.to_s}"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

end

context "with both If-None-Match and If-Modified-Since headers" do
it "ignores If-Modified-Since as specified in RFC 7232" do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this spec test if If-Modified-Since is ignored? The date needs to be before modifcation time of the file for this.

Copy link
Contributor Author

@rwojsznis rwojsznis Jun 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of current 1s padding that I did not touched I don't think it's correct, see spec

it "should return 304 Not Modified if file mtime is equal" do

update: ouh, I think I can see it now ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it seems there is something off... We need more integration tests for this.

ref_response = handle HTTP::Request.new("GET", "/test.txt")
headers = HTTP::Headers.new
headers["If-Modified-Since"] = ref_response.headers["Last-Modified"]
response = handle HTTP::Request.new("GET", "/test.txt")

This should result in 304 Not Modified (both with and without this PR), but it doesn't.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, this works as intended.

headers["If-None-Match"] = "some random etag"
response = handle HTTP::Request.new("GET", "/test.txt", headers)
response.status_code.should eq(200)
response.headers["Etag"].should match(/W\/"\d+"$/)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validation of the etag format and should be a separate spec.

response = handle HTTP::Request.new("GET", "/test.txt", headers)
response.status_code.should eq(200)
response.headers["Etag"].should match(/W\/"\d+"$/)
response.body.should eq(File.read("#{__DIR__}/static/test.txt"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this expectation is necessary. Checking the status code should be enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually re-added body assertion after I removed it because of this comment 😅 as first (original) spec does it and I guess it won't hurt, right? 🤔

@@ -24,12 +26,13 @@ class HTTP::StaticFileHandler
end

def call(context)
unless context.request.method == "GET" || context.request.method == "HEAD"
request_method = context.request.method
unless ALLOWED_METHODS.includes?(request_method)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{"GET", HEAD"}.includes?(context.request.method)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should support OPTIONS as well, but that's for another PR.

@@ -0,0 +1,48 @@
class HTTP::StaticFileHandler::Cache
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a module since there are no instances for this type.

if_none_match = context.request.headers[HTTP_IF_NONE_MATCH]?
return false unless if_none_match

if_none_match == etag(file_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to recompute the etag, it's already available as context.response.headers[HTTP_ETAG]. Ditto below.

@rwojsznis
Copy link
Contributor Author

Thanks for comments guys! ❤️ Assuming this is acceptable way I will wrap it up (cleanup and address comments)

@straight-shoota I tried to keep it in a single file/class as suggested - but damn it seemed cluttered - I will give it another go tho and let's decide one more important issues are addressed ;-)

end

private def self.etag(file_path)
%{W/"#{File.info(file_path).modification_time.epoch.to_s}"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need .to_s here. In fact, it results in a additional, useless string allocation.


# According to RFC 7232:
# A recipient must ignore If-Modified-Since if the request contains an If-None-Match header field
return if_none_match == context.response.headers[HTTP_ETAG] if if_none_match
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion for better readable control flow:

# According to RFC 7232:
# A recipient must ignore If-Modified-Since if the request contains an If-None-Match header field
if if_none_match = context.request.headers[HTTP_IF_NONE_MATCH]?
  if_none_match == context.response.headers[HTTP_ETAG]
elsif if_modified_since = context.request.headers[HTTP_IF_MODIFIED_SINCE]?
  header_time = HTTP.parse_time(if_modified_since)
  # ...
else
  false
end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using case would be even more readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

case by using somehow by using the implicit-object syntax

@straight-shoota
Copy link
Member

Please move the methods from Change type directly into StaticFileHandler. There is no reason to have an extra type for that.

def self.match?(context : HTTP::Server::Context, file_path : String) : Bool
# According to RFC 7232:
# A recipient must ignore If-Modified-Since if the request contains an If-None-Match header field
if if_none_match = context.request.headers[HTTP_IF_NONE_MATCH]?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

case
when if_none_match = context.request.headers[HTTP_IF_NONE_MATCH]?
  # ...
when if_modified_since = context.request.headers[HTTP_IF_MODIFIED_SINCE]?
  # ...
else
  # ...
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why??

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguably reads better than:

if if_none_match = context.request.headers[HTTP_IF_NONE_MATCH]?
  # ...
elsif if_modified_since = context.request.headers[HTTP_IF_MODIFIED_SINCE]?
  # ...
else
  # ...
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be honest by first glance I don't see how one is better than other (and vice versa) 😅 I thought at first with crystal's implicit-object syntax there might be some more sophisticated approach, maybe let's just leave it as-it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is clearly a case for if. Yes, it could be written as case but that's a bit different semantically. And it's more to type. Theoretically, you could rewrite every if ... elseif as case ... when, but what would be the point?

@rwojsznis
Copy link
Contributor Author

Please move the methods from Change

um, from Cache you mean? I would leave it there for readability / re-usability of internals and (imo) cleaner testability but I won't fight over that, I would rather see this getting merged sooner than later :P

@straight-shoota
Copy link
Member

Yes, I mean Cache. I don't think it really improves readability, because it spreads the implementation across different files and types. These methods are not really re-usable outside the static file handler. In fact, it would even help reusability if these methods were instance methods on the handler class, so that inheriting classes can easily overwrite them.

We could certainly generalize match? and etag to make them reusable for other handlers, but we're not there yet. This needs some discussion about design (where to put them?) and I guess we can leave that for another day.

Currently, these methods are only internal implementation and as such, don't need isolated unit tests at all.

@rwojsznis
Copy link
Contributor Author

Resolved all comments (I think) and rebased whole thing; gonna eat some breakfast and make sure I didn't messed smtg up, I don't trust myself 😅

again thanks for all the feedback!

@rwojsznis rwojsznis changed the title [WIP/Proposal] Redo HTTP::StaticFileHandler Add Etag support in HTTP::StaticFileHandler Jun 8, 2018
@rwojsznis
Copy link
Contributor Author

rwojsznis commented Jun 8, 2018

any ideas why test_linux is failing? is this a known issue? 🙈

edit: this #5890 ?

Copy link
Member

@straight-shoota straight-shoota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • spec add Etag header should also test the etag is different when mtime changes.
  • Would you rewrite the other existing examples to use initial_response as well? Name suggestions: returns 304 Not Modified for younger than Last-Modified and serves content for older than Last-Modified. You can use HTTP.parse_time to parse the Last-Modified header for calculations.

@@ -21,29 +21,94 @@ describe HTTP::StaticFileHandler do
response.body.should eq(File.read("#{__DIR__}/static/test.txt"))
end

context "with header If-Modified-Since" do
it "should add Etag header" do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nit-bit: don't use should. Just it "adds Etag header" do is fine. Less noise.
I know, some of the existing examples in this file use should but it doesn't add anything. Of course all specs describe something that should work. Feel free to remove all of them.

headers = HTTP::Headers.new
headers["If-Modified-Since"] = HTTP.rfc1123_date(File.info("#{__DIR__}/static/test.txt").modification_time)
headers["If-Modified-Since"] = initial_response.headers["Last-Modified"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also add a expectation response.headers["Last-Modified"].should eq initial_response.headers["Last-Modified"] later to ensure the header doesn't change between requests.

headers["If-Modified-Since"] = initial_response.headers["Last-Modified"]

response = handle HTTP::Request.new("GET", "/test.txt", headers), ignore_body: true
response.headers["Content-Type"]?.should eq(nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs to be a separate example. Just add this expectation to the previous one.

s/eq(nil)/be_nil/

end

it "should serve file if file mtime is younger" do
headers = HTTP::Headers.new
headers["If-Modified-Since"] = HTTP.rfc1123_date(File.info("#{__DIR__}/static/test.txt").modification_time - 1.hour)
response = handle HTTP::Request.new("GET", "/test.txt")
response = handle HTTP::Request.new("GET", "/test.txt", headers)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add ignore_body: false to make it explicit that this is expecting content.

headers = HTTP::Headers.new
headers["If-None-Match"] = "some random etag"

response = handle HTTP::Request.new("GET", "/test.txt", headers)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto ignore_body: false

initial_response = handle HTTP::Request.new("GET", "/test.txt")

headers = HTTP::Headers.new
headers["If-Modified-Since"] = HTTP.rfc1123_date(File.info("#{__DIR__}/static/test.txt").modification_time - 1.hour)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would also benefit from using Last-Modified:

HTTP.parse_time(initial_response.headers["Last-Modified"]) - 1.hour

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but you would have to parse it back with rfc1123_date (not mentioning that parse_time returns Time?, so not_nil! would have to be added as well) - is it really worth it? 🤔


it "should serve a file if header does not match etag" do
headers = HTTP::Headers.new
headers["If-Modified-Since"] = HTTP.rfc1123_date(File.info("#{__DIR__}/static/test.txt").modification_time)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

response.status_code.should eq(304)
end

it "should serve a file if header does not match etag" do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add even If-Modified-Since is fresh.

end
end

private def etag(file_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should take a modification_time so that we only call File.info(path).modification_time once, this saves performance and avoids race conditions where the etag and last modified are different.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race condition doesn't matter, but performance and reusability 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could even move this as a property of Response for even more reusability.

HTTP_IF_MODIFIED_SINCE = "If-Modified-Since"
HTTP_IF_NONE_MATCH = "If-None-Match"
HTTP_ETAG = "Etag"
HTTP_LAST_MODIFIED = "Last-Modified"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these consts should be private, or maybe even just removed and inlined. We don't typically agressively extract literals in crystal.

@rwojsznis rwojsznis force-pushed the redo_static_file_handler branch 2 times, most recently from 5d540a0 to c5fcce7 Compare June 11, 2018 06:29
@rwojsznis
Copy link
Contributor Author

How about now? 🙏

Copy link
Member

@straight-shoota straight-shoota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small improvement, apart from that: 👍

# An exact comparison might be slightly off, so we add 1s padding.
# Static files should generally not be modified in subsecond intervals, so this is perfectly safe.
# This might be replaced by a more sophisticated time comparison when it becomes available.
last_modified = modification_time(file_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move this to the call method and pass modification_time to both add_cache_header and cache_request? to avoid another duplicate file stat.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point! rebased PR once again 😅

@sdogruyol
Copy link
Member

This is a great improvement over the previous StaticFileHandler. I can already see many frameworks / libraries benefiting from this 🕺

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants