diff options
| author | Jon Maloy <jmaloy@redhat.com> | 2026-04-25 15:58:18 -0400 |
|---|---|---|
| committer | Stefano Brivio <sbrivio@redhat.com> | 2026-05-07 08:03:14 +0200 |
| commit | a458719f01838f0d5867e817d84a497637999ee1 (patch) | |
| tree | c4a406dca11a06e0c5ccc1587817af3ec16a45af /test/lib/git:/passt.top | |
| parent | ec96f0124282338cd2b2e65ff1aa3def8882ae23 (diff) | |
| download | passt-a458719f01838f0d5867e817d84a497637999ee1.tar passt-a458719f01838f0d5867e817d84a497637999ee1.tar.gz passt-a458719f01838f0d5867e817d84a497637999ee1.tar.bz2 passt-a458719f01838f0d5867e817d84a497637999ee1.tar.lz passt-a458719f01838f0d5867e817d84a497637999ee1.tar.xz passt-a458719f01838f0d5867e817d84a497637999ee1.tar.zst passt-a458719f01838f0d5867e817d84a497637999ee1.zip | |
tcp: Use SO_MEMINFO for accurate send buffer overhead accounting
The TCP window advertised to the guest/container must balance two
competing needs: large enough to trigger kernel socket buffer
auto-tuning, but not so large that sendmsg() partially fails, causing
retransmissions.
The current approach uses the difference (SNDBUF_GET() - SIOCOUTQ), but
SNDBUF_GET() returns a scaled value that only roughly accounts for
per-skb overhead. The clamped_scale approximation doesn't accurately
track the actual per-segment overhead, which can lead to both excessive
retransmissions and reduced throughput.
We now introduce the use of SO_MEMINFO to obtain SK_MEMINFO_SNDBUF and
SK_MEMINFO_WMEM_QUEUED from the kernel. The latter is presented in the
kernel's own accounting units, i.e. including the sk_buff overhead,
and matches exactly what the kernel's own sk_stream_memory_free()
function is using.
When data is queued and the overhead ratio is observable, we calculate
the per-segment overhead as (wmem_queued - sendq) / num_segments, then
determine how many additional segments should fit in the remaining
buffer space, considering the calculated per-mss overhead. This approach
treats segments as discrete quantities, and produces a more accurate
estimate of available buffer space than a linear scaling factor does.
When the ratio cannot be observed, e.g. because the queue is empty or
we are in a transient state, we fall back to the existing clamped_scale
calculation (scaling between 100% and 75% of buffer capacity).
When SO_MEMINFO succeeds, we also use SK_MEMINFO_SNDBUF directly to
set SNDBUF, avoiding a separate SO_SNDBUF getsockopt() call.
If SO_MEMINFO is unavailable, we fall back to the pre-existing
SNDBUF_GET() - SIOCOUTQ calculation.
Link: https://bugs.passt.top/show_bug.cgi?id=138
Link: https://github.com/containers/podman/issues/28219
Analysed-by: Yumei Huang <yuhuang@redhat.com>
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Diffstat (limited to 'test/lib/git:/passt.top')
0 files changed, 0 insertions, 0 deletions
