aboutgitcodebugslistschat
diff options
context:
space:
mode:
authorStefano Brivio <sbrivio@redhat.com>2025-01-16 20:06:59 +0100
committerStefano Brivio <sbrivio@redhat.com>2025-01-16 21:15:33 +0100
commit707f77b0a93160c8695b3cf5bfd7c24d9992b106 (patch)
treeeb1044580ceab023168821f58e9cc66b53d1ffcd
parent1b95bd6fa1148f3609bebf7b2bcd6d47376e61a6 (diff)
downloadpasst-707f77b0a93160c8695b3cf5bfd7c24d9992b106.tar
passt-707f77b0a93160c8695b3cf5bfd7c24d9992b106.tar.gz
passt-707f77b0a93160c8695b3cf5bfd7c24d9992b106.tar.bz2
passt-707f77b0a93160c8695b3cf5bfd7c24d9992b106.tar.lz
passt-707f77b0a93160c8695b3cf5bfd7c24d9992b106.tar.xz
passt-707f77b0a93160c8695b3cf5bfd7c24d9992b106.tar.zst
passt-707f77b0a93160c8695b3cf5bfd7c24d9992b106.zip
tcp: Fix ACK sequence getting out of sync on EPOLLOUT wake-up
In the next patches, I'm extending the usage of STALLED to a few more cases. Doing so revealed this issue: if we set STALLED and, consequently, EPOLLOUT (which is wrong, fixed later) right after we set a connection to ESTABLISHED (which also happened by mistake while I was preparing another change), with the guest sending data together with the final ACK in the handshake, say: 41.3661: vhost-user: got kick_data: 0000000000000001 idx: 1 41.3662: Flow 2 (NEW): FREE -> NEW 41.3663: Flow 2 (INI): NEW -> INI 41.3663: Flow 2 (INI): TAP [2a01:4f8:222:904::2]:52536 -> [2001:db8:9a55::1]:10003 => ? 41.3665: Flow 2 (TGT): INI -> TGT 41.3666: Flow 2 (TGT): TAP [2a01:4f8:222:904::2]:52536 -> [2001:db8:9a55::1]:10003 => HOST [::]:0 -> [2001:db8:9a55::1]:10003 41.3667: Flow 2 (TCP connection): TGT -> TYPED 41.3667: Flow 2 (TCP connection): TAP [2a01:4f8:222:904::2]:52536 -> [2001:db8:9a55::1]:10003 => HOST [::]:0 -> [2001:db8:9a55::1]:10003 41.3669: Flow 2 (TCP connection): TAP_SYN_RCVD: CLOSED -> SYN_SENT 41.3670: Flow 2 (TCP connection): Side 0 hash table insert: bucket: 339814 41.3672: Flow 2 (TCP connection): TYPED -> ACTIVE 41.3673: Flow 2 (TCP connection): TAP [2a01:4f8:222:904::2]:52536 -> [2001:db8:9a55::1]:10003 => HOST [::]:0 -> [2001:db8:9a55::1]:10003 41.3674: Flow 2 (TCP connection): TAP_SYN_ACK_SENT: SYN_SENT -> SYN_RCVD 41.3675: Flow 2 (TCP connection): ACK_FROM_TAP_DUE 41.3675: Flow 2 (TCP connection): timer expires in 10.000s 41.3675: vhost-user: got kick_data: 0000000000000001 idx: 1 41.3676: Flow 2 (TCP connection): ACK_FROM_TAP_DUE dropped 41.3676: Flow 2 (TCP connection): ESTABLISHED: SYN_RCVD -> ESTABLISHED 41.3678: Flow 2 (TCP connection): STALLED 41.3678: vhost-user: got kick_data: 0000000000000002 idx: 1 41.3679: Flow 2 (TCP connection): ACK_TO_TAP_DUE 41.3680: Flow 2 (TCP connection): timer expires in 0.010s 41.3680: Flow 2 (TCP connection): STALLED dropped we'll immediately get an EPOLLOUT event, call tcp_update_seqack_wnd(), but ignore window and ACK sequence update. At this point, we think we acknowledged all the data to the guest (but we didn't) and we'll happily proceed to clear the ACK_TO_TAP_DUE flag: 41.3780: Flow 2 (TCP connection): ACK_TO_TAP_DUE dropped 41.3780: Flow 2 (TCP connection): timer expires in 7200.000s 41.5754: vhost-user: got kick_data: 0000000000000001 idx: 1 41.9956: vhost-user: got kick_data: 0000000000000001 idx: 1 42.8275: vhost-user: got kick_data: 0000000000000001 idx: 1 while the guest starts retransmitting that data desperately, without ever getting an ACK segment from us: 1433 38.746353 2a01:4f8:222:904::2 → 2001:db8:9a55::1 94 TCP 54312 → 10003 [SYN] Seq=0 Win=65460 Len=0 MSS=65460 SACK_PERM TSval=1089126192 TSecr=0 WS=128 1434 38.747357 2001:db8:9a55::1 → 2a01:4f8:222:904::2 82 TCP 10003 → 54312 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=61440 WS=256 1435 38.747500 2a01:4f8:222:904::2 → 2001:db8:9a55::1 74 TCP 54312 → 10003 [ACK] Seq=1 Ack=1 Win=65536 Len=0 1436 38.747769 2a01:4f8:222:904::2 → 2001:db8:9a55::1 8266 TCP 54312 → 10003 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=8192 1437 38.747798 2a01:4f8:222:904::2 → 2001:db8:9a55::1 32841 TCP 54312 → 10003 [ACK] Seq=8193 Ack=1 Win=65536 Len=32767 1438 38.748049 2001:db8:9a55::1 → 2a01:4f8:222:904::2 74 TCP [TCP Window Update] 10003 → 54312 [ACK] Seq=1 Ack=1 Win=65280 Len=0 1439 38.954044 2a01:4f8:222:904::2 → 2001:db8:9a55::1 8266 TCP [TCP Retransmission] 54312 → 10003 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=8192 1440 39.370096 2a01:4f8:222:904::2 → 2001:db8:9a55::1 8266 TCP [TCP Retransmission] 54312 → 10003 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=8192 1441 40.202135 2a01:4f8:222:904::2 → 2001:db8:9a55::1 8266 TCP [TCP Retransmission] 54312 → 10003 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=8192 because seq_ack_to_tap is already set to the sequence after frame number 1437 in the example. For some reason, I could only reproduce this with vhost-user, IPv6, and passt running under valgrind while taking captures. Even under these conditions, it happens quite rarely. Forcibly send an ACK segment if we update the ACK sequence (or the advertised window). Fixes: e5eefe77435a ("tcp: Refactor to use events instead of states, split out spliced implementation") Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
-rw-r--r--tcp.c6
1 files changed, 4 insertions, 2 deletions
diff --git a/tcp.c b/tcp.c
index ec433f7..72fca63 100644
--- a/tcp.c
+++ b/tcp.c
@@ -2200,8 +2200,10 @@ void tcp_sock_handler(const struct ctx *c, union epoll_ref ref,
if (events & EPOLLIN)
tcp_data_from_sock(c, conn);
- if (events & EPOLLOUT)
- tcp_update_seqack_wnd(c, conn, false, NULL);
+ if (events & EPOLLOUT) {
+ if (tcp_update_seqack_wnd(c, conn, false, NULL))
+ tcp_send_flag(c, conn, ACK);
+ }
return;
}