aboutgitcodebugslistschat
diff options
context:
space:
mode:
authorDavid Gibson <david@gibson.dropbear.id.au>2025-02-27 16:55:17 +1100
committerStefano Brivio <sbrivio@redhat.com>2025-02-28 01:32:38 +0100
commit52419a64f2dfa31707b31148e6a311bb57be6e5f (patch)
treec3543061ce32ffece9656e3e77b090d0fb2b8280
parentb2708218a6eec82fad98da52d7569d13cf35e05c (diff)
downloadpasst-52419a64f2dfa31707b31148e6a311bb57be6e5f.tar
passt-52419a64f2dfa31707b31148e6a311bb57be6e5f.tar.gz
passt-52419a64f2dfa31707b31148e6a311bb57be6e5f.tar.bz2
passt-52419a64f2dfa31707b31148e6a311bb57be6e5f.tar.lz
passt-52419a64f2dfa31707b31148e6a311bb57be6e5f.tar.xz
passt-52419a64f2dfa31707b31148e6a311bb57be6e5f.tar.zst
passt-52419a64f2dfa31707b31148e6a311bb57be6e5f.zip
migrate, tcp: Don't flow_alloc_cancel() during incoming migration
In tcp_flow_migrate_target(), if we're unable to create and bind the new socket, we print an error, cancel the flow and carry on. This seems to make sense based on our policy of generally letting the migration complete even if some or all flows are lost in the process. But it doesn't quite work: the flow_alloc_cancel() means that the flows in the target's flow table are no longer one to one match to the flows which the source is sending data for. This means that data for later flows will be mismatched to a different flow. Most likely that will cause some nasty error later, but even worse it might appear to succeed but lead to data corruption due to incorrectly restoring one of the flows. Instead, we should leave the flow in the table until we've read all the data for it, *then* discard it. Technically removing the flow_alloc_cancel() would be enough for this: if tcp_flow_repair_socket() fails it leaves conn->sock == -1, which will cause the restore functions in tcp_flow_migrate_target_ext() to fail, discarding the flow. To make what's going on clearer (and with less extraneous error messages), put several explicit tests for a missing socket later in the migration path to read the data associated with the flow but explicitly discard it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
-rw-r--r--tcp.c19
1 files changed, 16 insertions, 3 deletions
diff --git a/tcp.c b/tcp.c
index d23b6d9..b3aa9a2 100644
--- a/tcp.c
+++ b/tcp.c
@@ -2708,6 +2708,9 @@ int tcp_flow_repair_on(struct ctx *c, const struct tcp_tap_conn *conn)
{
int rc = 0;
+ if (conn->sock < 0)
+ return 0;
+
if ((rc = repair_set(c, conn->sock, TCP_REPAIR_ON)))
err("Failed to set TCP_REPAIR");
@@ -2725,6 +2728,9 @@ int tcp_flow_repair_off(struct ctx *c, const struct tcp_tap_conn *conn)
{
int rc = 0;
+ if (conn->sock < 0)
+ return 0;
+
if ((rc = repair_set(c, conn->sock, TCP_REPAIR_OFF)))
err("Failed to clear TCP_REPAIR");
@@ -3377,7 +3383,8 @@ int tcp_flow_migrate_target(struct ctx *c, int fd)
if ((rc = tcp_flow_repair_socket(c, conn))) {
flow_err(flow, "Can't set up socket: %s, drop", strerror_(-rc));
- flow_alloc_cancel(flow);
+ /* Can't leave the flow in an incomplete state */
+ FLOW_ACTIVATE(conn);
return 0;
}
@@ -3453,6 +3460,10 @@ int tcp_flow_migrate_target_ext(struct ctx *c, struct tcp_tap_conn *conn, int fd
return rc;
}
+ if (conn->sock < 0)
+ /* We weren't able to create the socket, discard flow */
+ goto fail;
+
if (tcp_flow_select_queue(s, TCP_SEND_QUEUE))
goto fail;
@@ -3540,8 +3551,10 @@ int tcp_flow_migrate_target_ext(struct ctx *c, struct tcp_tap_conn *conn, int fd
return 0;
fail:
- tcp_flow_repair_off(c, conn);
- repair_flush(c);
+ if (conn->sock >= 0) {
+ tcp_flow_repair_off(c, conn);
+ repair_flush(c);
+ }
conn->flags = 0; /* Not waiting for ACK, don't schedule timer */
tcp_rst(c, conn);