-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add HTTP::Request#host and HTTP::Request#host_with_port for convenience #3075
Conversation
@@ -98,6 +98,16 @@ class HTTP::Request | |||
value | |||
end | |||
|
|||
# Return request host from headers | |||
def host | |||
@headers["Host"]?.try &.to_s.sub(/:\d+\z/, "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we drop the regex here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's your suggestion instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to suggest the same :-)
Simply find the index of ':'
and if such index exist make a substring until that index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it faster than using a Regex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, using Regex is almost always slower than not using Regex. Specially if this is just a char index lookup. Matching a regex involves much more logic than just a char index, and also allocated memory for the match results (and I don't know if PCRE allocates memory for that too).
Try this:
require "benchmark"
def regex(string)
string.sub(/:\d+\z/, "")
end
def index(string)
index = string.index(':')
index ? string[0...index] : string
end
# This is so LLVM can't optimize it that easily
string = String.build { |str| str << "www.google.com:80" }
pp regex(string)
pp index(string)
Benchmark.ips do |x|
x.report("regex") { regex(string) }
x.report("index") { index(string) }
end
I get:
regex(string) # => "www.google.com"
index(string) # => "www.google.com"
regex 1.41M (±14.83%) 4.38× slower
index 6.18M (±15.03%) fastest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(though I must admit that I expected it to be much slower with regex)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed benchmark @asterite 🙏
Removed the regex 👍 |
def host | ||
return unless @headers["Host"] | ||
index = @headers["Host"].index(":") | ||
index ? @headers["Host"][0...index] : @headers["Host"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please assign @headers["Host"]
to a local variable, otherwise we have 2 lookups inside that hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still 2 lookups :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this should be something like:
host = @headers["Host"]?
return unless host
# etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated 👍
host = @headers["Host"]? | ||
return unless host | ||
index = host.index(":") | ||
index ? host[0...index] : host |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can replace this by:
host_with_port.try &.split(':').[0]
and add, for completeness:
def port
host_with_port.try &.split(':').[1]
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But split will create an array and two strings, which will be slower. Like this is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether the Host
header can ever contain an ipv6 address, (formatted [1234:5678::1]:80
?) but if it does, this code will break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
String#rindex
should work regardless.
Update: how about using URI.parse("//#{@headers["Host"]}").host
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can consider the index of ]
first, and compute the index of :
after it if it's there, and then split.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
URI 232.25k (±57.98%) 1.64× slower
custom 380.77k (±25.71%) fastest
regex 26.4k (±16.06%) 14.43× slower
Using
/\A(?<host>[^:]+?|\[.+?\])(?::(?<port>\d+))?\z/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RX14 It's less about guessing intent as more about preventing false positives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the benchmark results. For increased redability and reduced complexity (and all the maintenance and security benefits that come with it), I'm in favor of using URI
. A 1.64x slowdown compared to the custom implementation is nothing. In terms of Moore's Law, this performance drawback will be amortized in less than a year. ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Speed benchmarks don't tell everything. What about memory usage? We merely want to return the Host header and want to be nice to the developer by extracting/skipping the :port
part out of it. It's definitely not an URI, just a host:port
definition.
Using URI.parse
means that we instantiate an URI object on the Heap (it puts pressure on the GC) just to discard it immediately because we're only interested in the Host and maybe the Port values.
I believe reproducing URI::Parse#parse_host (which looks like what NGINX is doing) or the more straightforward solution from @RX14 would be more correct, and achieve the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't know if this is anything useful, but I played around a bit with different implementations and benchmarking using @jhass's example host strings.
Is still on the radar? |
I really don't think this should have been merged without ipv6 support. |
Oh, I forgot about this... we can improve this in the future |
I agree with @asterite, it's better to have this for now. We can improve later |
Today i needed to access the
host
of the incomingHTTP::Request
and couldn't find a convenient way to do so in Crystal.I used rack for the implementation.