Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HTTP::Request#host and HTTP::Request#host_with_port for convenience #3075

Merged
merged 1 commit into from
Nov 20, 2016
Merged

Conversation

sdogruyol
Copy link
Member

@sdogruyol sdogruyol commented Aug 1, 2016

Today i needed to access the host of the incoming HTTP::Request and couldn't find a convenient way to do so in Crystal.

I used rack for the implementation.

@@ -98,6 +98,16 @@ class HTTP::Request
value
end

# Return request host from headers
def host
@headers["Host"]?.try &.to_s.sub(/:\d+\z/, "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we drop the regex here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's your suggestion instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to suggest the same :-)

Simply find the index of ':' and if such index exist make a substring until that index.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it faster than using a Regex?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, using Regex is almost always slower than not using Regex. Specially if this is just a char index lookup. Matching a regex involves much more logic than just a char index, and also allocated memory for the match results (and I don't know if PCRE allocates memory for that too).

Try this:

require "benchmark"

def regex(string)
  string.sub(/:\d+\z/, "")
end

def index(string)
  index = string.index(':')
  index ? string[0...index] : string
end

# This is so LLVM can't optimize it that easily
string = String.build { |str| str << "www.google.com:80" }

pp regex(string)
pp index(string)

Benchmark.ips do |x|
  x.report("regex") { regex(string) }
  x.report("index") { index(string) }
end

I get:

regex(string) # => "www.google.com"
index(string) # => "www.google.com"
regex   1.41M (±14.83%)  4.38× slower
index   6.18M (±15.03%)       fastest

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(though I must admit that I expected it to be much slower with regex)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed benchmark @asterite 🙏

@sdogruyol
Copy link
Member Author

Removed the regex 👍

def host
return unless @headers["Host"]
index = @headers["Host"].index(":")
index ? @headers["Host"][0...index] : @headers["Host"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please assign @headers["Host"] to a local variable, otherwise we have 2 lookups inside that hash.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still 2 lookups :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this should be something like:

host = @headers["Host"]?
return unless host
# etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated 👍

host = @headers["Host"]?
return unless host
index = host.index(":")
index ? host[0...index] : host
Copy link
Contributor

@MaloJaffre MaloJaffre Aug 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can replace this by:

host_with_port.try &.split(':').[0]

and add, for completeness:

def port
  host_with_port.try &.split(':').[1]
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But split will create an array and two strings, which will be slower. Like this is fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether the Host header can ever contain an ipv6 address, (formatted [1234:5678::1]:80?) but if it does, this code will break.

Copy link
Contributor

@Sija Sija Aug 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String#rindex should work regardless.
Update: how about using URI.parse("//#{@headers["Host"]}").host?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can consider the index of ] first, and compute the index of : after it if it's there, and then split.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

   URI 232.25k (±57.98%)  1.64× slower
custom 380.77k (±25.71%)       fastest
 regex   26.4k (±16.06%) 14.43× slower

Using

/\A(?<host>[^:]+?|\[.+?\])(?::(?<port>\d+))?\z/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RX14 It's less about guessing intent as more about preventing false positives.

Copy link
Contributor

@paddor paddor Aug 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the benchmark results. For increased redability and reduced complexity (and all the maintenance and security benefits that come with it), I'm in favor of using URI. A 1.64x slowdown compared to the custom implementation is nothing. In terms of Moore's Law, this performance drawback will be amortized in less than a year. ;-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speed benchmarks don't tell everything. What about memory usage? We merely want to return the Host header and want to be nice to the developer by extracting/skipping the :port part out of it. It's definitely not an URI, just a host:port definition.

Using URI.parse means that we instantiate an URI object on the Heap (it puts pressure on the GC) just to discard it immediately because we're only interested in the Host and maybe the Port values.

I believe reproducing URI::Parse#parse_host (which looks like what NGINX is doing) or the more straightforward solution from @RX14 would be more correct, and achieve the same.

Copy link
Contributor

@paddor paddor Aug 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know if this is anything useful, but I played around a bit with different implementations and benchmarking using @jhass's example host strings.

@samueleaton
Copy link
Contributor

Is still on the radar?

@asterite asterite merged commit b6e4744 into crystal-lang:master Nov 20, 2016
@asterite asterite added this to the 0.20.0 milestone Nov 20, 2016
@RX14
Copy link
Member

RX14 commented Nov 20, 2016

I really don't think this should have been merged without ipv6 support.

@asterite
Copy link
Member

Oh, I forgot about this... we can improve this in the future

@sdogruyol
Copy link
Member Author

I agree with @asterite, it's better to have this for now. We can improve later

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants