diff options
| author | Stefano Brivio <sbrivio@redhat.com> | 2025-12-03 20:04:21 +0100 |
|---|---|---|
| committer | Stefano Brivio <sbrivio@redhat.com> | 2025-12-08 09:15:36 +0100 |
| commit | 000601ba86da0d876fc91e0813a1e752540666f1 (patch) | |
| tree | b82da8e30fec32e1eda175ae6a232de9d02ac999 /doc | |
| parent | 920a479de40b58a81178e5d6e96c0eed30b992d5 (diff) | |
| download | passt-000601ba86da0d876fc91e0813a1e752540666f1.tar passt-000601ba86da0d876fc91e0813a1e752540666f1.tar.gz passt-000601ba86da0d876fc91e0813a1e752540666f1.tar.bz2 passt-000601ba86da0d876fc91e0813a1e752540666f1.tar.lz passt-000601ba86da0d876fc91e0813a1e752540666f1.tar.xz passt-000601ba86da0d876fc91e0813a1e752540666f1.tar.zst passt-000601ba86da0d876fc91e0813a1e752540666f1.zip | |
tcp: Adaptive interval based on RTT for socket-side acknowledgement checks
A fixed 10 ms ACK_INTERVAL timer value served us relatively well until
the previous change, because we would generally cause retransmissions
for non-local outbound transfers with relatively high (> 100 Mbps)
bandwidth and non-local but low (< 5 ms) RTT.
Now that retransmissions are less frequent, we don't have a proper
trigger to check for acknowledged bytes on the socket, and will
generally block the sender for a significant amount of time while
we could acknowledge more data, instead.
Store the RTT reported by the kernel using an approximation (exponent),
to keep flow storage size within two (typical) cachelines. Check for
socket updates when half of this time elapses: it should be a good
indication of the one-way delay we're interested in (peer to us).
Representable values are between 100 us and 3.2768 s, and any value
outside this range is clamped to these bounds. This choice appears
to be a good trade-off between additional overhead and throughput.
This mechanism partially overlaps with the "low RTT" destinations,
which we use to infer that a socket is connected to an endpoint to
the same machine (while possibly in a different namespace) if the
RTT is reported as 10 us or less.
This change doesn't, however, conflict with it: we are reading
TCP_INFO parameters for local connections anyway, so we can always
store the RTT approximation opportunistically.
Then, if the RTT is "low", we don't really need a timer to
acknowledge data as we'll always acknowledge everything to the
sender right away. However, we have limited space in the array where
we store addresses of local destination, so the low RTT property of a
connection might toggle frequently. Because of this, it's actually
helpful to always have the RTT approximation stored.
This could probably benefit from a future rework, though, introducing
a more integrated approach between these two mechanisms.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Diffstat (limited to 'doc')
0 files changed, 0 insertions, 0 deletions
