aboutgitcodebugslistschat
diff options
context:
space:
mode:
authorStefano Brivio <sbrivio@redhat.com>2025-01-17 10:10:10 +0100
committerStefano Brivio <sbrivio@redhat.com>2025-01-21 14:28:37 +0100
commit54bb972cfb2637f64a9718023a2351f8f259abdb (patch)
tree53d71b98e587ec3b278e6983f976c6c28330b090
parent8757834d145a06b845aa0bb6bdfd4f93971b8d74 (diff)
downloadpasst-54bb972cfb2637f64a9718023a2351f8f259abdb.tar
passt-54bb972cfb2637f64a9718023a2351f8f259abdb.tar.gz
passt-54bb972cfb2637f64a9718023a2351f8f259abdb.tar.bz2
passt-54bb972cfb2637f64a9718023a2351f8f259abdb.tar.lz
passt-54bb972cfb2637f64a9718023a2351f8f259abdb.tar.xz
passt-54bb972cfb2637f64a9718023a2351f8f259abdb.tar.zst
passt-54bb972cfb2637f64a9718023a2351f8f259abdb.zip
tcp: Disable Nagle's algorithm (set TCP_NODELAY) on all sockets
Following up on 725acd111ba3 ("tcp_splice: Set (again) TCP_NODELAY on both sides"), David argues that, in general, we don't know what kind of TCP traffic we're dealing with, on any side or path. TCP segments might have been delivered to our socket with a PSH flag, but we don't have a way to know about it. Similarly, the guest might send us segments with PSH or URG set, but we don't know if we should generally TCP_CORK sockets and uncork on those flags, because that would assume they're running a Linux kernel (and a particular version of it) matching the kernel that delivers outbound packets for us. Given that we can't make any assumption and everything might very well be interactive traffic, disable Nagle's algorithm on all non-spliced sockets as well. After all, John Nagle himself is nowadays recommending that delayed ACKs should never be enabled together with his algorithm, but we don't have a practical way to ensure that our environment is free from delayed ACKs (TCP_QUICKACK is not really usable for this purpose): https://news.ycombinator.com/item?id=34180239 Suggested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
-rw-r--r--tcp.c15
1 files changed, 15 insertions, 0 deletions
diff --git a/tcp.c b/tcp.c
index a012b81..4d6a6b3 100644
--- a/tcp.c
+++ b/tcp.c
@@ -757,6 +757,19 @@ static void tcp_sock_set_bufsize(const struct ctx *c, int s)
}
/**
+ * tcp_sock_set_nodelay() - Set TCP_NODELAY option (disable Nagle's algorithm)
+ * @s: Socket, can be -1 to avoid check in the caller
+ */
+static void tcp_sock_set_nodelay(int s)
+{
+ if (s == -1)
+ return;
+
+ if (setsockopt(s, SOL_TCP, TCP_NODELAY, &((int){ 1 }), sizeof(int)))
+ debug("TCP: failed to set TCP_NODELAY on socket %i", s);
+}
+
+/**
* tcp_update_csum() - Calculate TCP checksum
* @psum: Unfolded partial checksum of the IPv4 or IPv6 pseudo-header
* @th: TCP header (updated)
@@ -1285,6 +1298,7 @@ static int tcp_conn_new_sock(const struct ctx *c, sa_family_t af)
return -errno;
tcp_sock_set_bufsize(c, s);
+ tcp_sock_set_nodelay(s);
return s;
}
@@ -2058,6 +2072,7 @@ void tcp_listen_handler(const struct ctx *c, union epoll_ref ref,
goto cancel;
tcp_sock_set_bufsize(c, s);
+ tcp_sock_set_nodelay(s);
/* FIXME: When listening port has a specific bound address, record that
* as our address