aboutgitcodebugslistschat
diff options
context:
space:
mode:
authorStefano Brivio <sbrivio@redhat.com>2025-12-04 07:13:32 +0100
committerStefano Brivio <sbrivio@redhat.com>2025-12-08 09:15:36 +0100
commit2b5c9064dbf9f9840d8c44ee7b0bdaebd27b6a8e (patch)
tree906f05bac9a94f13661d0c7a9e3d25a65e7bd99f
parentcf1925fb7b777d1b4dae4816f30af1fa5c6ebfa5 (diff)
downloadpasst-2b5c9064dbf9f9840d8c44ee7b0bdaebd27b6a8e.tar
passt-2b5c9064dbf9f9840d8c44ee7b0bdaebd27b6a8e.tar.gz
passt-2b5c9064dbf9f9840d8c44ee7b0bdaebd27b6a8e.tar.bz2
passt-2b5c9064dbf9f9840d8c44ee7b0bdaebd27b6a8e.tar.lz
passt-2b5c9064dbf9f9840d8c44ee7b0bdaebd27b6a8e.tar.xz
passt-2b5c9064dbf9f9840d8c44ee7b0bdaebd27b6a8e.tar.zst
passt-2b5c9064dbf9f9840d8c44ee7b0bdaebd27b6a8e.zip
tcp: Allow exceeding the available sending buffer size in window advertisements
If the remote peer is advertising a bigger value than our current sending buffer, it means that a bigger sending buffer is likely to benefit throughput. We can get a bigger sending buffer by means of the buffer size auto-tuning performed by the Linux kernel, which is triggered by aggressively filling the sending buffer. Use an adaptive boost factor, up to 150%, depending on: - how much data we sent so far: we don't want to risk retransmissions for short-lived connections, as the latency cost would be unacceptable, and - the current RTT value, as we need a bigger buffer for higher transmission delays The factor we use is not quite a bandwidth-delay product, as we're missing the time component of the bandwidth, which is not interesting here: we are trying to make the buffer grow at the beginning of a connection, progressively, as more data is sent. The tuning of the amount of boost factor we want to apply was done somewhat empirically but it appears to yield the available throughput in rather different scenarios (from ~ 10 Gbps bandwidth with 500ns to ~ 1 Gbps with 300 ms RTT) and it allows getting there rather quickly, within a few seconds for the 300 ms case. Note that we want to apply this factor only if the window advertised by the peer is bigger than the current sending buffer, as we only need this for auto-tuning, and we absolutely don't want to incur unnecessary retransmissions otherwise. The related condition in tcp_update_seqack_wnd() is not redundant as there's a subtractive factor, sendq, in the calculation of the window limit. If the sending buffer is smaller than the peer's advertised window, the additional limit we might apply might be lower than we would do otherwise. Assuming that the sending buffer is reported as 100k, sendq is 20k, we could have these example cases: 1. tinfo->tcpi_snd_wnd is 120k, which is bigger than the sending buffer, so we boost its size to 150k, and we limit the window to 120k 2. tinfo->tcpi_snd_wnd is 90k, which is smaller than the sending buffer, so we aren't trying to trigger buffer auto-tuning and we'll stick to the existing, more conservative calculation, by limiting the window to 100 - 20 = 80k If we omitted the new condition, we would always use the boosted value, that is, 120k, even if potentially causing unnecessary retransmissions. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
-rw-r--r--tcp.c38
1 files changed, 38 insertions, 0 deletions
diff --git a/tcp.c b/tcp.c
index 8bf1b51..1f57fda 100644
--- a/tcp.c
+++ b/tcp.c
@@ -353,6 +353,13 @@ enum {
#define LOW_RTT_TABLE_SIZE 8
#define LOW_RTT_THRESHOLD 10 /* us */
+/* Parameters to temporarily exceed sending buffer to force TCP auto-tuning */
+#define SNDBUF_BOOST_BYTES_RTT_LO 2500 /* B * s: no boost until here */
+/* ...examples: 5 MB sent * 500 ns RTT, 250 kB * 10 ms, 8 kB * 300 ms */
+#define SNDBUF_BOOST_FACTOR 150 /* % */
+#define SNDBUF_BOOST_BYTES_RTT_HI 6000 /* apply full boost factor */
+/* 12 MB sent * 500 ns RTT, 600 kB * 10 ms, 20 kB * 300 ms */
+
/* Ratio of buffer to bandwidth * delay product implying interactive traffic */
#define SNDBUF_TO_BW_DELAY_INTERACTIVE /* > */ 20 /* (i.e. < 5% of buffer) */
@@ -1037,6 +1044,35 @@ void tcp_fill_headers(const struct ctx *c, struct tcp_tap_conn *conn,
}
/**
+ * tcp_sndbuf_boost() - Calculate limit of sending buffer to force auto-tuning
+ * @conn: Connection pointer
+ * @tinfo: tcp_info from kernel, must be pre-fetched
+ *
+ * Return: increased sending buffer to use as a limit for advertised window
+ */
+static unsigned long tcp_sndbuf_boost(const struct tcp_tap_conn *conn,
+ const struct tcp_info_linux *tinfo)
+{
+ unsigned long bytes_rtt_product;
+
+ if (!bytes_acked_cap)
+ return SNDBUF_GET(conn);
+
+ /* This is *not* a bandwidth-delay product, but it's somewhat related:
+ * as we send more data (usually at the beginning of a connection), we
+ * try to make the sending buffer progressively grow, with the RTT as a
+ * factor (longer delay, bigger buffer needed).
+ */
+ bytes_rtt_product = (long long)tinfo->tcpi_bytes_acked *
+ tinfo->tcpi_rtt / 1000 / 1000;
+
+ return clamped_scale(SNDBUF_GET(conn), bytes_rtt_product,
+ SNDBUF_BOOST_BYTES_RTT_LO,
+ SNDBUF_BOOST_BYTES_RTT_HI,
+ SNDBUF_BOOST_FACTOR);
+}
+
+/**
* tcp_update_seqack_wnd() - Update ACK sequence and window to guest/tap
* @c: Execution context
* @conn: Connection pointer
@@ -1155,6 +1191,8 @@ int tcp_update_seqack_wnd(const struct ctx *c, struct tcp_tap_conn *conn,
if ((int)sendq > SNDBUF_GET(conn)) /* Due to memory pressure? */
limit = 0;
+ else if ((int)tinfo->tcpi_snd_wnd > SNDBUF_GET(conn))
+ limit = tcp_sndbuf_boost(conn, tinfo) - (int)sendq;
else
limit = SNDBUF_GET(conn) - (int)sendq;