Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(WIP) Refactor HTTP::Client to use a transport #6001

Conversation

straight-shoota
Copy link
Member

This PR is a first step to refactor HTTP::Client to make it more customizable by extracting the actual connection management to a Transport class.

Every HTTP::Client uses a configurable transport to establish the actual IO connection. By default, this is HTTP::Client.default_transport which is an instance of DefaultTransport and basically enables connections to any TCP endpoint (the implementation is currently very rudimentary).

If HTTP::Client is initialized with a host and port argument, it will use a TCPTransport connecting to a fixed endpoint. This is basically the behaviour as it was until now: the Client instance is bound to a specific connection.

The static methods like HTTP::Client.get will use HTTP::Client.default_client by default. But they can also receive a transport argument and create a custom client using that transport (thus it won't be reused).

This proposal allows for more advanced features to be added to HTTP::Client, which should rather be a high-level interface while Transport does the heavy lifting.
One such feature is to make requests to Unix sockets (#2735) which is already implemented in this PR.
Further enhancements would be connecting through a proxy transport as well as an advanced connection management for DefaultTransport which can pool and re-use connections.

While still work in progress I'd like to get some feedback on the general design. Please don't comment on implementation details, I know there are many rough edges. The docs are not updated yet either.

The interface between HTTP::Client and Transport probably needs to be more refined.
Currently, I've basically extracted the part where a TCPSocket was created and replaced it with transport.connect(uri, request). The transport establishes an appropriate connection and returns the IO. The decision where to connect can be based on the URI and request.
I'm contemplating about moving the request and response serialization entirely to the transport. This would move more low level stuff out of the client. It would also allow the transport to easily return responses without reading them from an IO. Practical applications for this are for example a response cache or implementing a transport for the file scheme to serve local files as HTTP responses directly without serializing through an IO.

@asterite
Copy link
Member

It seems Transport is also a thing in Go. Since we are copying Go so much, I can't understand why so many of you are against the idea of making Crystal more typed and more similar to other typed languages.

@straight-shoota
Copy link
Member Author

@asterite I'm not sure if your comment is sarcastic or just off-topic.

But yes, Go has such a transport as well and so have other HTTP clients in other languages. At least this has nothing to do with the type system.

I was just compiling a list of similar implementations for reference:

  • Ruby: Hurley has a connection property which is a lambda request -> response
  • Go: RoundTripper (interface request -> response) and Transport (implementation, including proxy, reusing connections, HTTP/2 and a lot of basic HTTP workflow like redirects).
  • Java: com.google.api.client.http.HttpTransport: builds a request (method, url -> request ) and request.execute() gets the response.
  • Ruby: Faraday implements a middleware interface similar to HTTP::Handler.

@asterite
Copy link
Member

Cool, I thought Transport was Go-exclusive.

@asterite
Copy link
Member

port is now required?

@straight-shoota
Copy link
Member Author

straight-shoota commented Apr 24, 2018

Internally, port was always required. You can't connect to a TCP socket without a port. But formerly, if omitted in the constructor, it was automatically deduced from tls which meant "use tls" or "don't use tls". The meaning of tls has now changed, it no longer enforces to use tls for every request, but only those that are actually using https scheme.
This means, you can use the same client to do https and http requests.

It would be possible to provide a default argument for port, but I'm not sure that makes much sense. If tls = false it's easy, but if tls = true either 80 or 443 make sense. So I think it's better to make it required in any case.

@asterite
Copy link
Member

This means, you can use the same client to do https and http requests

Huh?

@straight-shoota
Copy link
Member Author

client = HTTP::Client.new(tls: true)
client.get("https://crystal-lang.org/")
client.get("http://crystal-lang.org/") # -> this wouldn't work with the previous meaning of `tls`

@asterite
Copy link
Member

Strange, I thought we only allowed URIs, not full URLs, when initializing a client. But maybe it makes sense, I don't know. In the old ways it used to connect to a server with that base URL, I don't know what's going on now, but it's probably okay...

@straight-shoota
Copy link
Member Author

Not sure what you mean... URI is a superset of URL.

But right, currently every HTTP::Client instance can only be used to make HTTP requests against one endpoint that needs to be specified in the constructor. With this PR you can use one HTTP::Client to make requests anywhere. It doesn't only use a single TCP connection, but can issue connections to different servers through Transport.

client = HTTP::Client.new
client.get("https://crystal-lang.org/")
client.get("https://github.com/crystal-lang")

Or keep the current behaviour to connect to one server:

client = HTTP::Client.new("crystal-lang.org", 443)
client.get("/")
client.get("/docs")

@bararchy
Copy link
Contributor

bararchy commented Apr 25, 2018

I love the first example more

client = HTTP::Client.new
client.get("https://crystal-lang.org/")
client.get("https://github.com/crystal-lang")

@Sija
Copy link
Contributor

Sija commented Apr 25, 2018

It would be possible to provide a default argument for port, but I'm not sure that makes much sense. If tls = false it's easy, but if tls = true either 80 or 443 make sense. So I think it's better to make it required in any case.

IMO that's too common use-case, why not use 80/443 as default, depending on http/https scheme?

btw, tls with port 80 does not make sense.

@straight-shoota
Copy link
Member Author

straight-shoota commented Apr 25, 2018

@bararchy It depends on the use case. If you just need to query one API endpoint, it's much easier to use a client that's specifically connected to that endpoint.

@Sija Because the constructor HTTP::Client.new(host) wouldn't know the scheme you want to send requests with in advance. It could be viable to add an option to specify the port as "http" or "https" and derive the default value from that. But you'll need to specify either a port or a scheme, and specifying a port makes more sense to me. From a port you can always deduce the default scheme (80 -> "http", 443 -> "https", other -> nil), from just a scheme, you can't be certain about which port to use.
There is also the possibility to use a URI as argument, which can include scheme, host and port:

HTTP::Client.new(URI.parse("https://example.com")) # connects to default port 443
HTTP::Client.new(URI.parse("http://example.com:1234")) # connects to port 1234

@asterite
Copy link
Member

So how does keep-alive work?

@straight-shoota
Copy link
Member Author

@asterite I think that should be implemented in the transport. That's why it probably makes sense to read the response there (last paragraph in the OP). For now, the TCPTransport keeps the connection and reuses it by default (until close is called, currently from keep-alive handling in the client).
DefaultTransport opens a new connection for every request, but as already mentioned should be enhanced to use a connection pool for re-using TCP connections across clients.

@ysbaddaden
Copy link
Contributor

IMHO: this sounds awfully more complex than necessary.

  • Why bother with Transport and TCPTransport and probably SSLTransport and UNIXTransport when we have IO objects? => unrequited new concept, new naming.
  • Why require to specify the port when standards define defaults? => unrequited nuisance.

No need to over-engineer HTTP::Client. The HTTP::Client#basic_auth proves that connecting to whatever is wrong: authentication is tied to a domain, not to every domain the client will connect to (security issue)!

Please, instead of pushing some idea with code to review, open an issue with detailed limitations and problems with HTTP::Client (and whatever else) you think there are, so we can discuss them, see whether they're really limitations, and how we should deal with them (or not).

@straight-shoota
Copy link
Member Author

@ysbaddaden This PR is a proof of concept and not intended to be reviewed. It's far to incomplete for that anyway. I've just been exploring this approach to improve and simplify the http client implementation. And I wanted to share that and use it to support a discussion. Maybe it would have been better to open an issue and posted the code parallel to that. In fact, I can still do that.

TCPTransport is only there to keep the current behaviour of connecting the client to one specific server (and keeping that connection open) because DefaultTransport is not fully implemented. It doesn't have to stay that way and it can probably be removed later. TCPTransfer is not meant to be explicitly used anyway. It's mostly an interim internal implementation.

UnixTransport is already included in this PR and I don't think there is a viable alternative to that if we want HTTP::Client to be able to connect to Unix sockets. We have IO objects, but they need to be created somewhere. Maybe it can be simplified to pass in a UnixSocket (or any IO) directly as transport.

@asterite
Copy link
Member

It's a bit strange because in Go a Transport (well, RoundTripper) takes a Request and returns a Response, not an IO.

I think Transport will make it easier to test an HTTP::Client, too. Right now there are no (any?) tests for HTTP::Client.

@RX14
Copy link
Contributor

RX14 commented Apr 26, 2018

I really don't want to make things more complex here either. Why isn't IO enough? HTTP is a wire protocol, you write a request and receive a response. IO covers this exact API perfectly.

If we want middleware - which is looks to me is what you really want - let's implement middleware not this.

And lastly, @asterite, I work on crystal because it's not just another typed language. Making crystal into just another normal typed language, even with type inference, would completely destroy any advantage crystal has and be completely tone-deaf to the audience of crystal.

@straight-shoota
Copy link
Member Author

Passing the client simply an IO can't be enough. How would you make simultaneous requests over the same client if it can't create a new IO itself? Or re-use connections to different servers?

Middleware is related but not dependend on this nor could this be implemented using middleware unless you would use the middleware to completely hijack the requests-response roundtrip.

@asterite
Copy link
Member

@straight-shoota It seems there's a misunderstanding about how HTTP::Client was originally designed to work. The idea is that you create it with one server in mind, and execute queries against one server.

How would you make simultaneous requests over the same client if it can't create a new IO itself?

This wasn't intended in the original design. You can have one request per time. To execute a next request you must consume the last response. That's also why "keep-alive" is important here: with one server, "keep-alive" is true by default (depending on the HTTP protocol of the server), and so you can keep that connection alive.

It seems this PR is completely changing how HTTP::Client works. That's why for me it makes no sense to do HTTP::Client.new (to which server??)

That's also why doing this:

client.get("http://...")

makes no sense to me. You are targeting a single server, you don't need to repeat the scheme, host, etc. Just send the path.

If you want to redesign HTTP::Client, that's fine. But it shouldn't be done in a PR that completely changes how it works. I know you opened another issue for this, but it seems we first have to clarify how we want HTTP::Client to work.

@asterite
Copy link
Member

Or re-use connections to different servers?

That's why we need a pool of HTTP::Clients. But you'd have one for each different server, so doing HTTP::Client.get(...) would, in my mind, use that pool, and try to see if there's already an existing client connected to that server that can be used. Of course this is difficult if we have to track that each client can only perform one request at a time...

@RX14
Copy link
Contributor

RX14 commented Apr 27, 2018

Indeed, it's far easier to think of a pool of http clients, instead of a http client which contains a pool.

@straight-shoota
Copy link
Member Author

straight-shoota commented Apr 29, 2018

@asterite I took it that connecting to a single server was only a temporary limitation because a proper implementation is missing, but not as the final design for HTTP::Client. That behaviour of wrapping a single socket is to me merely an http connection (or HTTP:IO) - and that doesn't sound particularly useful as a high-level interface overall. From an http client I would expect more sophisticated features such as the ability to send requests to different servers.
What you describe as "pool of http clients" should in my opinion be the actual http client. The current HTTP::Client (in my description a http connection) could be used as a basis for a "pool of http clients" but it a) doesn't need to be publicly exposed and b) it makes more sense if it was a "pool of connections" and the http protocol is added in the pool instead of in each connection.

The class-level interface of HTTP::Client represents a interface that I'd expect from a http client: it lets you send request to any server. Implementation-wise it is very limited though, as each call will initialize a new client with a new TCP connection - which is not very efficient especially if connecting to the same servers again and again.

The solution currently would to create instances of HTTP::Client and execute requests to specific servers through these. But then you can't send requests concurrently to the same server unless you connect with multiple HTTP::Client instances. Keeping maybe dozens or even more instances of HTTP::Client around to be able to re-use connections will need some of management to limit the number of concurrent requests to the same server and reduce the number of unused idle connections.

These are all features that a proper HTTP client should have in my opinion. Given that the use case "concurrently request HTTP resources from the same or different servers" does not sound pretty rare for typical Crystal applications, I am convinced that a HTTP client implementation in the stdlib should be able to support that.

@asterite
Copy link
Member

You are probably right, though I wouldn't know how to implement keep alive that way. And then an instance of an HTTP Client probably makes no sense if we go that way, because it's more useful to use the class methods.

If someone wants to implement this, go ahead.

@straight-shoota
Copy link
Member Author

Yes, the implementation of the connection pool is going to be very complex. I've looked through Go's Transport implementation, it's really huge.

I think instances of HTTP::Client are still useful to hold custom configuration for SSL context, high-level HTTP protocol features such as cookies and following redirects, and additional middleware.

@ysbaddaden
Copy link
Contributor

ysbaddaden commented Apr 29, 2018

@straight-shoota what you want isn't a HTTP client, but a general Web Browser-like API (or libcurl).

What we mean with HTTP Client could be confused with a HTTP Connection, but there are still differences. For example in HTTP/2 the Connection is generic, then a Client and Server can be built on top of it (they behave differently), yet they still act over a single HTTP connection, not over many.

It's important to not merge all these layers (client, protocol, connection, io), and more —the file scheme isn't even related to the HTTP protocol.

What you want to achieve is another layer, supporting different schemes such as http (using HTTP::Client), file or ws. Hell, it could even support ftp, sftp or gopher and be extendable. That would be great. Wouldn't pushing this into HTTP::Client, even as class methods, be a mistake?

@asterite
Copy link
Member

@ysbaddaden I'm starting to agree with @straight-shoota . At least Go's http client can be used by multiple fibers and it works fine, so you can have multiple concurrent requests, something that right now doesn't work well... and if we don't do that, then any class that wraps an API won't be concurrent by default, and all of that logic must be implemented in each case, which is a burden.

I think @waj was also interested in having HTTP::Client have a pool of connections, though I don't think he'll have time to do this, but maybe he has some time to comment on this.

@RX14
Copy link
Contributor

RX14 commented Apr 30, 2018

I'm starting to want to clean-sheet the whole of HTTP to be honest... With http2 in mind.

@jkthorne
Copy link
Contributor

I like this idea. I have been swapping out or monkey patching the socket to do some fun things. but I would be interested in where this goes. I am still a little fuzzy on all the features that you intend to get out of this. Is it fair to say you want:

  • concurrency support
  • connection pooling
  • agnostic IO

@miketheman
Copy link
Contributor

#codetriage This PR appears to have been a good discussion point months ago, and stalled.
It's unclear what the future ought to be for HTTP::Client behaviors and how extensively the default behaviors should be - this probably deserves a scoping session to determine what behaviors we've seen from other languages/libraries/frameworks.

@straight-shoota
Copy link
Member Author

I think this can be closed. Let's continue the discussion in #6011

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants