Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Packet dispatch #19

Open
batonius opened this issue Jun 18, 2017 · 43 comments
Open

[RFC] Packet dispatch #19

batonius opened this issue Jun 18, 2017 · 43 comments

Comments

@batonius
Copy link
Contributor

Problem

Currently, sloltcp uses iteration to dispatch ingress packets to sockets and to poll sockets for egress packets. While acceptable on the platforms without libstd and memory allocation, this approach is very inefficient on systems where growable containers are available.

Discussion

  1. The problem is two-fold: ingress packets should be dispatched by lookup tables based on their destination, while only "dirty" sockets should be polled for egress packets.
  2. After Add raw sockets #18 is merged, there will be two kinds of sockets: l3 (raw) sockets should be dispatched to based on packet's IP version and IP protocol number, and l4 sockets (tcp and udp) should be dispatched to based on the dst port.
  3. Sockets can be bound to a specific IP address. As a result, a solution should support several sockets of the same type listening on the same port but bound to different IP addresses.
  4. A solution should keep the existing behavior when compiled without the std feature.

Proposal

  1. Dispatch packets to sockets based on the port/protocol number only, leaving the question of sockets being bound to IPs to the Socket::proccess method.
  2. Expand SocketSet by (conditionally) adding three std::HashMap's to it, for raw, tcp and udp sockets, and a Vec to keep dirty sockets' handlers.
  3. Hash map for raw sockets should have a type of std::HashMap<(RawSocketType, IpProtocol), Vec<Handle>>, hash maps for udp and tcp should have a type of std::HashMap<u16, Vec<Handle>> . Vec is required to support the possibility of several sockets listening to the same port/protocol on different addresses.
  4. Extend SocketSet::add to use socket's endpoint and ip_protocol methods to construct an index to insert socket's handler into the corresponding lookup table.
  5. Extend SocketSet::remove to remove socket from a lookup table based on socket's endpoint.
  6. Extend Set::IterMut to support construction from a mutable reference to ManagedSlice and an iterator of Handles, producing an iterator of Socket.
  7. Replace SocketSet::iter_mut with iter_raw(&mut self, ip_repr: &IpRepr) -> IterMut and iter_l4(&mut self, ip_repr: &IpRepr) -> IterMut, which should look up sockets based on ip_repr and return an iterator over the sockets found. With the std feature disabled, the functions should return iterators over the entire SocketSet.
  8. Add iter_dirty to SocketSet to produce an iterator of Socket by draining dirty_sockets vector, or an iterator over the entire SocketSet if std is disabled.
  9. Add mark_drity to SocketSet to mark a socket as dirty.
  10. Replace iter_mut in EthernetInterface::poll with iter_raw and iter_l4 and in EthernetInterface::emit with iter_dirty.

Questions

  1. The only breakage of the current public API would be the requirement to explicitly mark written-to sockets as dirty. This can be avoided by having SocketSet::get_mut mark each requested socket as dirty (which kinda defeats the purpose), by having a specialized method SocketSet::get_mut_for_write to do so, or by devising some smart handler type which would mark a socket as dirty on send.
  2. I'm not sure if the reference counting in Set::Item is relevant here.
  3. I think EthernetInterface::poll should be refactored into several methods to increase readability.
@batonius batonius changed the title [RFC] Packet dispatching [RFC] Packet dispatch Jun 18, 2017
@whitequark
Copy link
Contributor

Sockets can be bound to a specific IP address. As a result, a solution should support several sockets of the same type listening on the same port but bound to different IP addresses.

This already works, doesn't it?

while only "dirty" sockets should be polled for egress packets

I have an idea that doesn't involve even more conditional compilation. Add two storage items:

  • Every socket should gain a "dirty" flag, internally usable by SocketSet;
  • SocketSet should gain an additional storage area, sized the same as its socket list, structured as a ring buffer, containing indexes of sockets. Every time a clean socket is made dirty, an index of the dirty socket is added to that ring buffer. This makes it possible to do O(1) polling for egress packets.

@whitequark
Copy link
Contributor

Regarding ingress, using BTreeMap instead of a HashMap reduces the std requirement to a mere collections requirement, and makes it possible to do fast ingress handling on no_std code. Otherwise it does look like the only option to me.

@batonius
Copy link
Contributor Author

This already works, doesn't it?

Yeah, I meant it for the packet dispatch mechanism, meaning a simple port -> socket map won't work.

I have an idea that doesn't involve even more conditional compilation

Right, we don't need a vector for this, we can use ManagedSlice.

Every time a clean socket is made dirty, an index of the dirty socket is added to that ring buffer.

How tho? Right now a Socket doesn't hold a reference back to the SocketSet it's in to report it's dirty and adding it would require Weak<RefCell<>> or something like this. I have some ideas I've sketched above but I don't like any of them.

using BTreeMap instead of a HashMap

Right, so far it looks like

#[cfg(any(feature = "std", feature = "collections"))]
#[derive(Debug)]
struct LookupTables {
    raw: BTreeMap<(RawPayloadType, IpProtocol), BTreeSet<usize>>,
    tcp: BTreeMap<u16, BTreeSet<usize>>,
    udp: BTreeMap<u16, BTreeSet<usize>>,
    dirty: Vec<usize>,
}

#[cfg(not(any(feature = "std", feature = "collections")))]
#[derive(Debug)]
struct LookupTables {}

#[derive(Debug)]
pub struct Set<'a, 'b: 'a, 'c: 'a + 'b> {
    sockets: ManagedSlice<'a, Option<Item<'b, 'c>>>,
    lookup_tables: LookupTables,
}

@whitequark
Copy link
Contributor

whitequark commented Jun 21, 2017

OK, this looks more or less fine to me. A few suggestions:

  • Let's keep SocketSet and what you call LookupTables separate. Right now, with the BTreeSet<usize>, there is a potential ABA problem. If you use SocketHandles from the public API of SocketSet then it is not an issue.
  • Let's call LookupTables a smoltcp::routing::SocketRouter, since that's what it does: it routes packets to and from sockets. In the future I can see there being smoltcp::routing::NodeRouter, and they will probably share some data structures even.
  • The API consumer of smoltcp has to uphold the contract of SocketRouter to ensure that packets are promptly dispatched, that is, it has to set the dirty flag in SocketRouter at the same time as actually dirtying the socket. (This is somewhat ugly, but based on my experience wrapping smoltcp, it is not going to be a problem.) Therefore SocketRouter is external to all of: SocketSet, Socket (of any kind), and EthernetInterface. In fact I suggest that SocketRouter be passed into poll just like SocketSet.
  • Almost always, there will be exactly one socket per port in the entire SocketRouter since multi-address setups are rare. How about using (u16, IpAddress) (note the order) as the key of BTreeMap? This gets rid of allocations and indirections in to BTreeSet, which almost always are just wasting resources, but does not (I believe) pessimize the comparisons because almost always the comparison will return early after determining that the ports are not equal. Of course you still have to compare the IP address in the key but a 4/16 byte comparison should be cheaper than two pointer accesses.

Also, I encourage you to get early feedback by updating the PR in small pieces.

@batonius
Copy link
Contributor Author

I don't like the idea of using routing in the names for packet dispatch. "Routing" is a well-defined concept and packet dispatch isn't usually a part of it. For starters, there are no "routes" involved :) I think something like smoltcp::dispatching::Dispatcher or DispatchTable would be a better name.

How about using (u16, IpAddress) (note the order) as the key of BTreeMap

This means we would have to do two searches, first with IpAddress set to the src_ip from the packet to search for an address-bound socket and then, if it fails (and it will more often than not), with IpAddress::Undefined to search for a not bound one. I'm not against it tho, it may be better from the API pov, since we would return an Option and not an iterator.

Also, I think I'll make two PRs, since ingress and egress dispatching are mostly independent.

@whitequark
Copy link
Contributor

This means we would have to do two searches, first with IpAddress set to the src_ip from the packet to search for an address-bound socket and then, if it fails (and it will more often than not), with IpAddress::Undefined to search for a not bound one.

Nope! We can use BTreeMap::range to perform just one search.

@whitequark
Copy link
Contributor

I don't like the idea of using routing in the names for packet dispatch. "Routing" is a well-defined concept and packet dispatch isn't usually a part of it. For starters, there are no "routes" involved :) I think something like smoltcp::dispatching::Dispatcher or DispatchTable would be a better name.

Agreed. On further thinking smoltcp::socket::SocketDispatcher sounds good, since it is functionally adjacent to smoltcp::socket::SocketSet in every way.

@batonius
Copy link
Contributor Author

batonius commented Jun 21, 2017

Nope! We can use BTreeMap::range to perform just one search.

I don't get it. Suppose you have three sockets, *:123, 192.168.1.1:123 and 192.168.1.3:123. They are stored in the BTreeMap in this order. How would you construct a range to find the right socket for each address in 192.168.1.1..192.168.1.4?

@whitequark
Copy link
Contributor

How would you construct a range to find the right socket for each address in 192.168.1.1..192.168.1.4?

Why do you need this?

@batonius
Copy link
Contributor Author

batonius commented Jun 22, 2017

So here's my current attempt: master...batonius:packet_dispatch . I've spent some time on these lifetimes :)

What I'm thinking of doing next is

  1. Add process_repr methods to all the sockets which would be the same as process but would accept tcp/udp/raw reprs.
  2. Add l4 header parsing to EthernetInterface::poll, call get_tcp_sockets or get_udp_sockets depending on l4, and loop over the resulting iterator + filter_map(<Socket as AsSocket<TcpSocket>>::try_as_socket) calling process_repr on the sockets. This way we won't be parsing l4 headers two times.

I'm not sure I like SocketDispater being a separate entity from the API pov, the usage in the tests isn't quite friendly. You have to first add a socket to a SocketSet and then not forget to add it again to a Dispatcher, this time requesting it from the SocketSet. If we ignore egress packets for now, hiding Dispatcher inside SocketSet looks like a clean and reasonable solution.

@whitequark
Copy link
Contributor

Add process_repr methods to all the sockets which would be the same as process but would accept tcp/udp/raw reprs.

You mean split process into the part that parses and accepts/rejects the packet and the rest? Hm.

@batonius
Copy link
Contributor Author

batonius commented Jun 22, 2017

Yes. Basically, we now need to parse l4 headers in EthernetInterface::poll to get the dst port to dispatch the packet anyway, so there's no reason to parse it again in Socket::proccess. These methods will be socket-specific, but we can iterate over specific sockets only with filter_map.

I'm not sure about the accept/reject part tho, since without std/collections we're still gonna iterate over all the sockets so we can't just assume the packet is for the socket in these new methods. We can have separate versions for std/no_std, but I'm not sure if it's worth it since the accept/reject check is just a couple of u32/u16 comparisons.

@whitequark
Copy link
Contributor

I'm not happy about this design. How about we factor out the IP handling machinery from EthernetInterface and then keep all of our L4 handling code there?

@batonius
Copy link
Contributor Author

I'm all for it, I've stated EthernetInterface::poll as an issue in the OP :)

I propose I rename that I called SocketDispatcher to SocketDispatchTable, put it inside SocketSet, add iface/ip_dispatcher.rs with IpDispatcher and move all the IP processing logic (everything from the EthernetProtocol::Ipv4 branch) to it, refactoring it into several methods in the process, while keeping the EthernetInterface's interface intact. Again, I'm explicitly ignoring egress packets for now.

@whitequark
Copy link
Contributor

move all the IP processing logic (everything from the EthernetProtocol::Ipv4 branch) to it, refactoring it into several methods in the process, while keeping the EthernetInterface's interface intact

Let's start with this. It's long overdue, will have a lot of benefits, and necessary for this anyway.

@batonius
Copy link
Contributor Author

Here's my attempt at refactoring EthernetInterface::poll: master...batonius:poll_refactor . I've tried to keep the logic intact, but I've made two changes I think are reasonable:

  1. We don't try to find a socket to process a packet with non-icmp/tcp/udp proto. Instead, we just send
    PROTO_UNREACHABLE If no raw socket processed it successfully.
  2. We end the iteration over tcp/udp sockets after the first non-rejected one.

I haven't factored it out into a separate module as I was going to because these functions either have no state of their own or rely on EthernetInterface's internal state. With these changes, it should be trivial to plug in the socket dispatcher just by replacing iter_mut().filter_map with specialized search methods.

If you're ok with it, I'll make a PR so I can move on with my packet_dispatch branch.

@whitequark
Copy link
Contributor

Looks all good! I have a few nits but no real issues.

@whitequark
Copy link
Contributor

In fact I went ahead and merged it.

@whitequark
Copy link
Contributor

I propose I rename that I called SocketDispatcher to SocketDispatchTable, put it inside SocketSet

Ack on putting it into SocketSet.

@whitequark
Copy link
Contributor

Moving forward, do you think you can cover EthernetInterface with tests? A basic coverage of every Ok and Err returned from the process_* functions would be a great start, I'll chime in then, adding support for the newly landed rustc support for gcov.

@batonius
Copy link
Contributor Author

batonius commented Jun 26, 2017

Moving forward, do you think you can cover EthernetInterface with tests?

Sure, I'll look into it.

@whitequark
Copy link
Contributor

Something I just remembered that might be very relevant to your work is that having several open sockets in LISTEN state with the same local endpoint is perfectly legal. That's how listen backlog is implemented (by a layer on top of smoltcp).

@batonius
Copy link
Contributor Author

Right, I somehow missed that point completely, it's not enough to dispatch tcp packets based on the dst endpoint, a server can have several clients connected to it, we need to consider src endpoint as well, and we don't know it until we established a connection. This means we need another layer of dispatching and a way for a socket to report the fact it has established a connection to a remote endpoint.

@batonius
Copy link
Contributor Author

Now I think if it, it should be easy enough to do by checking if socket's remote_endpoit has changed after process in process_tcpv4.

@whitequark
Copy link
Contributor

Yeah that works.

@batonius
Copy link
Contributor Author

I've updated master...batonius:packet_dispatch .

I'm not sure about the API yet, but my current plan is to add a smart pointer-like type that would hold mutable references both to Socket and DispatchTable and would implement Deref<Target=Socket> . SocketSet::get would return this smart pointer instead of a raw reference. The pointer should be able to check if Socket's endpoints have changed in drop, and if they have, update DispatcheTable accordingly. This way it would be possible to add an unbound socket to SocketSet and then to automatically add it to DispatchTable on bind or connect.

I'm also thinking of overriding bind and likes in this smart pointer so it could return errors like AlreadyInUse, which is not possible now.

@whitequark
Copy link
Contributor

Hmm, I'm not sure if I like this idea very much, we already have drop magic in Device and that's pretty bad already. But I can give it a look.

@batonius
Copy link
Contributor Author

I've got it working: master...batonius:packet_dispatch .

I'm not sure about names, and I think there are some public API ready to be cut out, there are no tests, the formatting is wrong, but overall this is the design I propose we adopt. Examples appear to be working, but I can't test them in no_std mode.

Smart-pointer approach works fine, I've even managed to implement it without RefCell, API changes are minimal, no_std mode is supported, adding egress dispatching should be easy.

@batonius
Copy link
Contributor Author

I've cleaned up the code, and I'm more or less happy with it. If you're ok with the approach I've taken, I'll create a PR so you can review the code and request changes.

@batonius
Copy link
Contributor Author

batonius commented Jul 2, 2017

Added egress packet dispatching, the only thing I'm not sure about is the best way to determine if a TcpSocket is dirty. Right now I consider every TCP socket not in the CLOSED or LISTEN states as dirty, but limit the number of iterations over the dirty sockets list so we don't get an infinite loop. Overwise it appears to be working.

@batonius
Copy link
Contributor Author

batonius commented Jul 2, 2017

Fixed Travis builds. It looks like the collections crate has been merged into alloc, we should probably get rid of it too.

@whitequark
Copy link
Contributor

I'm in the middle of a fairly major refactoring of the TCP socket dispatch (it shouldn't impact your work much if at all), so I'll do a review once I finish that.

@batonius
Copy link
Contributor Author

batonius commented Jul 2, 2017

Good. I'm gonna look into these tests for EthernetInterface, it's not as straightforward as I thought though.

@batonius
Copy link
Contributor Author

Any progress on reviewing this? I keep rebasing the branch just in case.

@whitequark
Copy link
Contributor

Still working on that refactoring.

@whitequark
Copy link
Contributor

I'm done with the refactoring, I think you will find that the new code is significantly better.

@batonius
Copy link
Contributor Author

Nice. Could you have a brief look at my branch to see if you're ok with the overall design before I start rebasing it?

@whitequark
Copy link
Contributor

Your egress packet handling will have to be redone. One thing I'm working on right now is having poll compute the earliest interval by which it should be called next in absence of inbound packets. On hosted platforms this lets me use select or whatever, and on embedded platforms this will let me go in deep sleep very efficiently.

So what should happen instead is every socket tells this interval for itself (it's either zero or infinity for UDP and raw sockets, and the retransmit timer derived vlaues for TCP sockets), and you sould use some sort of data structure to sort them and select the minimal one, maybe a BTreeMap also.

Also, I would really prefer it if you broke up the branch into many small PRs. For example, you can start with the expandable ring buffer (this may well come useful in the future for TCP!), that's not dependent on anything and I can just merge it right now with minor adjustments instead of keeping reviewing it in the mega-PR.

@whitequark
Copy link
Contributor

OK, I've very much warmed up to your smart pointer idea after thinking more about it. Essentially, existing interface (socket_set.get_mut().as_socket::<>()) already places the exact same restrictions (a &mut borrow of SocketSet) and is somewhat more ugly. So, let's get a few gradual, uncontroversional changes in!

  • The split of process into would_accept and process is OK, except I'd like to have a debug_assert! in the latter. After you do that, please kill Error::Rejected with extreme prejudice, it should have never existed.
  • Let's kill AsSocket completely too. Instead of that horrible trait, let's have a SocketRef<T> that dereferences to &mut T, stores a &mut SocketSet inside it, and it's all for now. (I think this corresponds to your SocketTracker). Then, socket_set.get(socket_handle) should be altered to return SocketRef<T>, or let's even just have three functions get_raw_socket &etc.

After that I have some thoughts on what to do next, but am too tired to describe them in full.

@whitequark
Copy link
Contributor

I've merged #38. Looking forward, can you start by adding the "smart pointers", but without any ingress/egress handling code? That seems like a good additive path forward.

@batonius
Copy link
Contributor Author

batonius commented Sep 8, 2017

Sure. Is there a point in rebasing #34 after you've refactored RingBuffer, or should I just close it? It looks like the only thing worth keeping is 3da43fe .

@whitequark
Copy link
Contributor

whitequark commented Jun 22, 2019

Looking forward, can you start by adding the "smart pointers", but without any ingress/egress handling code? That seems like a good additive path forward.

There is unfortunately a serious problem here. Namely, SocketRef::into_inner is unsound, because you can't move a field out (or destructure) a type that implements Drop. This used to not be detected by Rust, but as of today it causes an ICE.

@JakkuSakura
Copy link
Contributor

I couldn't wait for the RFC, so I rolled out my own. TcpSocket is excluded from Item, and listen is never used in my program.

/// A handle specified for TCP
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Default, Hash)]
#[cfg_attr(feature = "defmt", derive(defmt::Format))]
pub struct TcpHandle(IpEndpoint, IpEndpoint);
impl TcpHandle {
    pub fn new(mut local: IpEndpoint, remote: IpEndpoint) -> Self {
        local.addr = Address::Unspecified;
        Self(local, remote)
    }
    pub fn local(self) -> IpEndpoint {
        self.0
    }
    pub fn remote(self) -> IpEndpoint {
        self.1
    }
}

/// An extensible set of sockets.
///
/// The lifetime `'a` is used when storing a `Socket<'a>`.
#[derive(Debug)]
pub struct Set<'a> {
    sockets: ManagedSlice<'a, Option<Item<'a>>>,
    #[cfg(feature = "socket-tcp")]
    tcp_sockets: HashMap<TcpHandle, TcpSocket<'a>>,
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants