Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP. Don't merge] Resilience to error in packet decoding #285

Closed
wants to merge 6 commits into from

Conversation

HeroicKatora
Copy link
Contributor

@HeroicKatora HeroicKatora commented Apr 11, 2019

Addresses #281. Work in progress. Addresses DOS by errors in the decoding step by establishing an error ring-buffer associated to the ethernet interface and socket type. When some decoding step fails, the error is put into the most specific error buffer available but otherwise ignored. This ensures that faulty packets can not affect further packet processing. This is a work in progress, intended for an early preview of ideas. As such, it may be incomplete, not compile at times and/or be buggy. I may possibly rebase or force-push the branch at will.

A second step of the mechanism design rework will try to rebalance the recv/send inbalance in case of a saturated phy. As layed out in the issue report, a full receive buffer can adversely affect send performance by blocking further sends. The error from the full buffer should break the receive part of the loop but it's not completely clear yet how to handle finally handle it. One possibility is putting it into the ring buffer or the ethernet iface, but this of course creates some friction with the return value a caller expects.

Alternatives

Callbacks have been evaluated as inferior for major reasons:

  • It's missing contextual information resulting from partial decoding. A failed udp decoding could for example still access and log the frames around itself. Re-decoding within the error callback is both inefficient, creates ambiguity and is awkward as it is effectively invoking a processing step that it expects to fail. This seems highly unintuitive.
  • Invoking the callback during polling means that constraints such as terminiation, timing, resource usage, ... need to be fulfilled by arbitrary user code. Without an extremely extensive documentation, it remains unclear how one could expect the programmer of the callback to know the internals enough to make good engineering decisions about its design.
  • The callback must fight the borrow checker if it wants to inspect any state such as the socket on which the error originated or the phy interface. Making error inspection a separate call makes this cleaner, less constrained and more efficient.
  • Ignoring most errors, as many usages likely want to, would become a no-op for the caller.

TODO

  • Cleanup some of the changes that were made just to quickly iterate over alternative designs for this feature. This is still ongoing

  • Make the code compile under a subset of features such as (socket-udp, proto-ipv4). This is actually an issue with the current library and I don't see any guide on which feature set should be supported other than all and default.

  • Implement error buffers for more than udp and decide on a strategy to handle left-over errors that don't belong to any socket.

  • Evaluate and choose the strategy for recv/send imbalance.

  • Implement tests to simulate and ensure resilience against the DOS potential presented in Ethernet denial of service #281.

@whitequark
Copy link
Contributor

It's an interesting approach. It is a bit awkward, and I don't like that. However, I agree that it seems like our best way going forward, so let's use it.

I think your lifetime changes would need to be a part of a separate PR. They change the public interface as well. I haven't looked at them closely yet as it's hard to follow here.

@HeroicKatora
Copy link
Contributor Author

Yes, it's all a bit raw and several changes are mixed together. I plan to split this up into several independent chunks once I have determined the overall usability.

@HeroicKatora
Copy link
Contributor Author

@whitequark Cleanly rebased only the error buffer related changes. Note that the lifetime within iface and udp is currently linked to other attributes as I wasn't sure what the original motivation was for having some of them overlap and some of them exposed in the generic parameters. Some feedback on this would be helpful. And, this still needs to be performed for the other socket types to ensure that really no error is generated. I finally may want to remove the Result<_, Error> part from the inner methods to make this explicit.

This is a ring buffer storing the failure states of previously received
(and dropped) packet. The buffer will always overwrite the oldest errors
when it is full. An empty buffer, for example Default constructed, will
simply drop all errors. The buffer will keep a count of the number of
ignored errors.

Overall, this makes it possible to indicate more than one failure
condition per call to the library user. This is necessary to not
terminate a `poll` call immediately on the first error. Stored errors
can later be retrieved and inspected outside and independently of the
call that created them.

An instance of the buffer is added to the udp socket but largely unused
at the moment.
Augments `ErrorBuffer` with a callback based consumer for all pending
errors.
Here is why a buffer for errors is needed: The `poll` method of the
ethernet interface may potentially produce more than one signalling
error. But breaking out of control flow leaves the user without a proper
continuation option. A renewed call to `poll` would instead start
iterating sockets and packet flow in the previous order again, giving
unfair advantage to the flows called first. Specifically, the inbound
flow may also be controlled by a malicious party and error during
parsing and reception must not affect other flows.

Instead, we will try to store these errors inside a buffer provided in
the builder so that the programmer can control the number of errors they
want to inspect. Overflowing errors are dropped entirely.

This is part of a patch series introducing that error handling to all
socket types.
Adds methods for interacting with an ErrorBuffer similar to the methods
provided on Result itself. But instead of panicking or ignoring an Err
variant the contained error is pushed into the buffer.
Parsing incoming packets should not affect control flow for all other
packet flows that are to be handled, to guarantee that such packets can
not influence throughput adversely. This change will instead push them
into the error buffers of the ethernet interface or udp socket,
depending on how far parsing succeeded.

This is part of a patch series to harden packet handling against
adverserial network conditions and an overloaded network stack. Other
commits change the handling for the other socket and protocol types.
Where tests previously asserted that `process` completed without an
error they will no call process and assert that no error has been
created in the error buffer afterwards. The check does not depend on an
error buffer with non-zero capacity. But when the check has previously
checked for an error case, add an actual error buffer with capacity for
a single element and assert the same condition on the result of `pop`.
Since tests are short enough, the result of these operations can not
wrongly report the error condition of a previous packet processing.
@m-labs-homu
Copy link

☔ The latest upstream changes (presumably 7c606ac) made this pull request unmergeable. Please resolve the merge conflicts.

@HeroicKatora
Copy link
Contributor Author

Just as a head up: The require changes are far too extensive to justify working on for me. I'm going to abandon this PR for good.

@whitequark
Copy link
Contributor

I see. That's a bit unfortunate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants