diff options
author | Stefano Brivio <sbrivio@redhat.com> | 2025-01-16 20:06:59 +0100 |
---|---|---|
committer | Stefano Brivio <sbrivio@redhat.com> | 2025-01-16 21:15:33 +0100 |
commit | 707f77b0a93160c8695b3cf5bfd7c24d9992b106 (patch) | |
tree | eb1044580ceab023168821f58e9cc66b53d1ffcd | |
parent | 1b95bd6fa1148f3609bebf7b2bcd6d47376e61a6 (diff) | |
download | passt-707f77b0a93160c8695b3cf5bfd7c24d9992b106.tar passt-707f77b0a93160c8695b3cf5bfd7c24d9992b106.tar.gz passt-707f77b0a93160c8695b3cf5bfd7c24d9992b106.tar.bz2 passt-707f77b0a93160c8695b3cf5bfd7c24d9992b106.tar.lz passt-707f77b0a93160c8695b3cf5bfd7c24d9992b106.tar.xz passt-707f77b0a93160c8695b3cf5bfd7c24d9992b106.tar.zst passt-707f77b0a93160c8695b3cf5bfd7c24d9992b106.zip |
tcp: Fix ACK sequence getting out of sync on EPOLLOUT wake-up
In the next patches, I'm extending the usage of STALLED to a few more
cases.
Doing so revealed this issue: if we set STALLED and, consequently,
EPOLLOUT (which is wrong, fixed later) right after we set a connection
to ESTABLISHED (which also happened by mistake while I was preparing
another change), with the guest sending data together with the final
ACK in the handshake, say:
41.3661: vhost-user: got kick_data: 0000000000000001 idx: 1
41.3662: Flow 2 (NEW): FREE -> NEW
41.3663: Flow 2 (INI): NEW -> INI
41.3663: Flow 2 (INI): TAP [2a01:4f8:222:904::2]:52536 -> [2001:db8:9a55::1]:10003 => ?
41.3665: Flow 2 (TGT): INI -> TGT
41.3666: Flow 2 (TGT): TAP [2a01:4f8:222:904::2]:52536 -> [2001:db8:9a55::1]:10003 => HOST [::]:0 -> [2001:db8:9a55::1]:10003
41.3667: Flow 2 (TCP connection): TGT -> TYPED
41.3667: Flow 2 (TCP connection): TAP [2a01:4f8:222:904::2]:52536 -> [2001:db8:9a55::1]:10003 => HOST [::]:0 -> [2001:db8:9a55::1]:10003
41.3669: Flow 2 (TCP connection): TAP_SYN_RCVD: CLOSED -> SYN_SENT
41.3670: Flow 2 (TCP connection): Side 0 hash table insert: bucket: 339814
41.3672: Flow 2 (TCP connection): TYPED -> ACTIVE
41.3673: Flow 2 (TCP connection): TAP [2a01:4f8:222:904::2]:52536 -> [2001:db8:9a55::1]:10003 => HOST [::]:0 -> [2001:db8:9a55::1]:10003
41.3674: Flow 2 (TCP connection): TAP_SYN_ACK_SENT: SYN_SENT -> SYN_RCVD
41.3675: Flow 2 (TCP connection): ACK_FROM_TAP_DUE
41.3675: Flow 2 (TCP connection): timer expires in 10.000s
41.3675: vhost-user: got kick_data: 0000000000000001 idx: 1
41.3676: Flow 2 (TCP connection): ACK_FROM_TAP_DUE dropped
41.3676: Flow 2 (TCP connection): ESTABLISHED: SYN_RCVD -> ESTABLISHED
41.3678: Flow 2 (TCP connection): STALLED
41.3678: vhost-user: got kick_data: 0000000000000002 idx: 1
41.3679: Flow 2 (TCP connection): ACK_TO_TAP_DUE
41.3680: Flow 2 (TCP connection): timer expires in 0.010s
41.3680: Flow 2 (TCP connection): STALLED dropped
we'll immediately get an EPOLLOUT event, call tcp_update_seqack_wnd(),
but ignore window and ACK sequence update. At this point, we think we
acknowledged all the data to the guest (but we didn't) and we'll
happily proceed to clear the ACK_TO_TAP_DUE flag:
41.3780: Flow 2 (TCP connection): ACK_TO_TAP_DUE dropped
41.3780: Flow 2 (TCP connection): timer expires in 7200.000s
41.5754: vhost-user: got kick_data: 0000000000000001 idx: 1
41.9956: vhost-user: got kick_data: 0000000000000001 idx: 1
42.8275: vhost-user: got kick_data: 0000000000000001 idx: 1
while the guest starts retransmitting that data desperately, without
ever getting an ACK segment from us:
1433 38.746353 2a01:4f8:222:904::2 → 2001:db8:9a55::1 94 TCP 54312 → 10003 [SYN] Seq=0 Win=65460 Len=0 MSS=65460 SACK_PERM TSval=1089126192 TSecr=0 WS=128
1434 38.747357 2001:db8:9a55::1 → 2a01:4f8:222:904::2 82 TCP 10003 → 54312 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=61440 WS=256
1435 38.747500 2a01:4f8:222:904::2 → 2001:db8:9a55::1 74 TCP 54312 → 10003 [ACK] Seq=1 Ack=1 Win=65536 Len=0
1436 38.747769 2a01:4f8:222:904::2 → 2001:db8:9a55::1 8266 TCP 54312 → 10003 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=8192
1437 38.747798 2a01:4f8:222:904::2 → 2001:db8:9a55::1 32841 TCP 54312 → 10003 [ACK] Seq=8193 Ack=1 Win=65536 Len=32767
1438 38.748049 2001:db8:9a55::1 → 2a01:4f8:222:904::2 74 TCP [TCP Window Update] 10003 → 54312 [ACK] Seq=1 Ack=1 Win=65280 Len=0
1439 38.954044 2a01:4f8:222:904::2 → 2001:db8:9a55::1 8266 TCP [TCP Retransmission] 54312 → 10003 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=8192
1440 39.370096 2a01:4f8:222:904::2 → 2001:db8:9a55::1 8266 TCP [TCP Retransmission] 54312 → 10003 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=8192
1441 40.202135 2a01:4f8:222:904::2 → 2001:db8:9a55::1 8266 TCP [TCP Retransmission] 54312 → 10003 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=8192
because seq_ack_to_tap is already set to the sequence after frame
number 1437 in the example.
For some reason, I could only reproduce this with vhost-user, IPv6,
and passt running under valgrind while taking captures. Even under
these conditions, it happens quite rarely.
Forcibly send an ACK segment if we update the ACK sequence (or the
advertised window).
Fixes: e5eefe77435a ("tcp: Refactor to use events instead of states, split out spliced implementation")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
-rw-r--r-- | tcp.c | 6 |
1 files changed, 4 insertions, 2 deletions
@@ -2200,8 +2200,10 @@ void tcp_sock_handler(const struct ctx *c, union epoll_ref ref, if (events & EPOLLIN) tcp_data_from_sock(c, conn); - if (events & EPOLLOUT) - tcp_update_seqack_wnd(c, conn, false, NULL); + if (events & EPOLLOUT) { + if (tcp_update_seqack_wnd(c, conn, false, NULL)) + tcp_send_flag(c, conn, ACK); + } return; } |