-
Notifications
You must be signed in to change notification settings - Fork 446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix retransmission and send offset overflow #65
Conversation
Thanks. Can you add tests? |
0280597
to
700ddd0
Compare
Good idea, there was one more problem I could spot through the tests: When it continued to send data, it was using a wrong (too low) sequence number. |
src/wire/tcp.rs
Outdated
@@ -10,7 +10,7 @@ use super::ip::checksum; | |||
/// | |||
/// A sequence number is a monotonically advancing integer modulo 2<sup>32</sup>. | |||
/// Sequence numbers do not have a discontiguity when compared pairwise across a signed overflow. | |||
#[derive(Debug, PartialEq, Eq, Clone, Copy, Default)] | |||
#[derive(Debug, PartialEq, Eq, Ord, Clone, Copy, Default)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No no no, this is very intentional! Sequence numbers are not totally ordered! They wrap!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes. Then by just looking at the two numbers there is no way to distinguish between the case of normal operation with local_seq_no just wrapped before the max statement and the retransmission case. There is also a wrapped case for retransmission where the minimum would have to be taken. So first a way to find out if the value of remote_last_seq was coming from a retransmission is needed, then it could be updated directly and there is no min/max involved.
I'll need to take a closer look at these changes. I'm not yet completely convinced they're correct. But, they do seem to go in the right direction. |
700ddd0
to
2a69903
Compare
This is roughly how I would solve it now. Instead of the casts it could use PartialOrd::partial_cmp. (N1) Normal with local_seq_no < remote_last_seq and (N2) normal with remote_last_seq < local_seq_no. (Wrapping happened already for remote_last_seq in N2.) (R1) Retransmission with remote_last_seq < local_seq_no and (R2) retransmission with local_seq_no < remote_last_seq. (Wrapping happend just now for local_seq_no in R2) Looking at old_local_seq_no helps to distinguish them, because in the wrapped cases it is guaranteed to be in the interval of the other two numbers while in the non-wrapped cases the opposite holds. It looks ugly rightnow and needs to presented in a better way. I think it is correct, would you agree that handling these cases is an acceptable approach? |
530489d
to
8196b7c
Compare
Maybe this is now a bit more readable? I have included the wrapping cases in the test. In wire.rs I had to make the two wrapping subtractions explicit because otherwise sending gave panics in debug mode. |
d31d8e7
to
3888f11
Compare
src/socket/tcp.rs
Outdated
(self.local_seq_no < old_local_seq_no && old_local_seq_no < self.remote_last_seq) { | ||
// These two cases describe retransmission where remote_last_seq lags behind local_seq_no. | ||
// The value for the next sequence number to be used is updated. | ||
self.remote_last_seq = self.local_seq_no; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can remove all empty cases and the unreachable which I added to see if something was missed. Then if would just be the following:
if (self.remote_last_seq < self.local_seq_no &&
!(self.remote_last_seq < old_local_seq_no && old_local_seq_no < self.local_seq_no)) ||
(self.local_seq_no < old_local_seq_no && old_local_seq_no < self.remote_last_seq) {
self.remote_last_seq = self.local_seq_no;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sure that you have discovered a real issue, but I do not seem to understand your explanation. Do you think you could expand on it or rephrase it in a different way?
3888f11
to
aaf9ca5
Compare
src/socket/tcp.rs
Outdated
(State::Established, TcpControl::None) => { | ||
self.timer.set_for_idle(timestamp, self.keep_alive); | ||
match self.timer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems obviously correct and OK to merge. Do you think you can extract this part and a dedicated test, and submit a PR for it?
Also, would you mind rewriting the condition a bit? I would add a Timer::is_retransmit() -> bool
function, and write the condition as if !timer.is_retransmit() || ack_len != 0 { self.timer.set_for_idle(...) }
src/wire/tcp.rs
Outdated
@@ -51,13 +51,13 @@ impl ops::Sub for SeqNumber { | |||
type Output = usize; | |||
|
|||
fn sub(self, rhs: SeqNumber) -> usize { | |||
(self.0 - rhs.0) as usize | |||
self.0.wrapping_sub(rhs.0) as usize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test for this?
aaf9ca5
to
bf1ffd7
Compare
Hello, |
src/socket/tcp.rs
Outdated
// ACKs are received, both for regular and retransmitted data. | ||
for local_seq_start in vec![LOCAL_SEQ, TcpSeqNumber(i32::MAX - 7), | ||
TcpSeqNumber(i32::MAX - 6), TcpSeqNumber(i32::MAX - 5), TcpSeqNumber(i32::MAX - 4), | ||
TcpSeqNumber(i32::MAX - 3), TcpSeqNumber(i32::MAX - 2)] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how wrapping is relevant here. TCP sequence numbers are compared modulo 2³². Specifically, TcpSeqNumber(i32::MAX - 6) - TcpSeqNumber(i32::MAX - 3)
equals TcpSeqNumber(-6) - TcpSeqNumber(-3)
.
Can you show me a test case that results in an error only when the sequence number wraps? If one exists, it means there is a bug somewhere else apart from the retransmission code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overflow assertions are fixed in the other commit with an extra test: 3480238 (but are not shown by github anymore since the master merge).
Here the test also includes wrapping and non-wrapping cases to cover the constellations of local_seq_no
, old_local_seq_no
and remote_last_seq
.
When the ACK arrives after the first packet retransmitted for MAX-7 we have: old_local_seq_no < remote_last_seq < local_seq_no
For MAX-6 local_seq_no
already wraps to: local_seq_no < old_local_seq_no < remote_last_seq
For MAX-3 local_seq_no
and remote_last_seq
now wrapped to: remote_last_seq < local_seq_no < old_local_seq_no
In all these cases the value of remote_last_seq
will be updated.
The last send macro sends an ACK for a normal transmission and the idea was that also the constellations are permuted for this condition where no update should happen. But I just realized that tx_buffer.len() == 0
is not enough to ensure that this is the case, so I can add some more data to be transfered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, now saw the picked commit, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expanded the test coverage for normal transmission where remote_last_seq should not be touched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(The example project mentioned in the issue can also be used to reproduce this, but it does not check the contents sent as the test cases do: https://github.com/pothos/taptest)
40133c9
to
d51f854
Compare
During the retransmission, higher sequence numbers than the last one calculated after (re)sending could be acknowledged. This led to an overflow in the offset calculation, which resulted in no further data being sent. Also, the wrong sequence number would have been used for following packets. Now the sequence number to be used for sending is updated accordingly when the local sequence number is changed.
d51f854
to
595746f
Compare
You have correctly identified the root cause of the problem however your patch is overly complicated. There is no need to test for wraparound since that is already handled by the |
Resolves #62
Duplicate ACKs should not replace the retransmission timer
but otherwise, as normal ACKs, set the keep-alive timer.
During the retransmission, higher sequence numbers than the
last one sent could be acknowledged. This led to an overflow
in the offset calculation, which resulted in no further data
being sent. Now the maximum is used to correctly set it to 0.