<feed xmlns='http://www.w3.org/2005/Atom'>
<title>passt/tcp_buf.c, branch podman23739</title>
<subtitle>Plug A Simple Socket Transport</subtitle>
<link rel='alternate' type='text/html' href='https://passt.top/passt/'/>
<entry>
<title>flow, treewide: Promote priority of selected flow-linked messages</title>
<updated>2026-06-09T02:28:20+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2026-06-05T12:30:40+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=dd8923b8adb9ab1e1ad79727ee0a912131f6e2cb'/>
<id>dd8923b8adb9ab1e1ad79727ee0a912131f6e2cb</id>
<content type='text'>
Most of out flow specific log messages are debug level for fear of flooding
the logs, even when they report real error conditions that might be off
significance.

Now that we have the mechanisms for log message rate limiting, we can do
better.  Promote many flow related messages to warning or error level, with
rate limiting.  While we're there add ratelimiting to a handful of existing
warning or error level messages.

They general heuristic is to promote messages that report a failure which
is not something that should be triggered by the guest doing something
weird.  This mostly means failures from socket operations we expect to be
legitimate.

Adding the ratelimiting means plumbing the 'now' timestamp through much
more of the code, hence the large churn.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Most of out flow specific log messages are debug level for fear of flooding
the logs, even when they report real error conditions that might be off
significance.

Now that we have the mechanisms for log message rate limiting, we can do
better.  Promote many flow related messages to warning or error level, with
rate limiting.  While we're there add ratelimiting to a handful of existing
warning or error level messages.

They general heuristic is to promote messages that report a failure which
is not something that should be triggered by the guest doing something
weird.  This mostly means failures from socket operations we expect to be
legitimate.

Adding the ratelimiting means plumbing the 'now' timestamp through much
more of the code, hence the large churn.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: Encode checksum computation flags in a single parameter</title>
<updated>2026-05-26T10:17:10+00:00</updated>
<author>
<name>Laurent Vivier</name>
<email>lvivier@redhat.com</email>
</author>
<published>2026-05-20T15:10:07+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=92845dd9f0450d54c135ccef6b118fc259ea8dad'/>
<id>92845dd9f0450d54c135ccef6b118fc259ea8dad</id>
<content type='text'>
tcp_fill_headers() takes a pointer to a previously computed IPv4 header
checksum to avoid recalculating it when the payload length doesn't
change, and a separate bool to skip TCP checksum computation.

Replace both parameters with a single uint32_t csum_flags that encodes:
- IP4_CSUM (bit 31): compute IPv4 header checksum from scratch
- TCP_CSUM (bit 30): compute TCP checksum
- IP4_CMASK (low 16 bits): cached IPv4 header checksum value

When IP4_CSUM is not set, the cached checksum is extracted from the low
16 bits.  This is cleaner than the pointer-based approach, and also
avoids a potential dangling pointer issue: a subsequent patch makes
tcp_fill_headers() access ip4h via with_header(), which scopes it to a
temporary variable, so a pointer to ip4h-&gt;check would become invalid
after the with_header() block.

Suggested-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Laurent Vivier &lt;lvivier@redhat.com&gt;
Reviewed-by: Jon Maloy &lt;jmaloy@redhat.com&gt;
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
tcp_fill_headers() takes a pointer to a previously computed IPv4 header
checksum to avoid recalculating it when the payload length doesn't
change, and a separate bool to skip TCP checksum computation.

Replace both parameters with a single uint32_t csum_flags that encodes:
- IP4_CSUM (bit 31): compute IPv4 header checksum from scratch
- TCP_CSUM (bit 30): compute TCP checksum
- IP4_CMASK (low 16 bits): cached IPv4 header checksum value

When IP4_CSUM is not set, the cached checksum is extracted from the low
16 bits.  This is cleaner than the pointer-based approach, and also
avoids a potential dangling pointer issue: a subsequent patch makes
tcp_fill_headers() access ip4h via with_header(), which scopes it to a
temporary variable, so a pointer to ip4h-&gt;check would become invalid
after the with_header() block.

Suggested-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Laurent Vivier &lt;lvivier@redhat.com&gt;
Reviewed-by: Jon Maloy &lt;jmaloy@redhat.com&gt;
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: Pass explicit data length to tcp_fill_headers()</title>
<updated>2026-05-19T23:21:51+00:00</updated>
<author>
<name>Laurent Vivier</name>
<email>lvivier@redhat.com</email>
</author>
<published>2026-05-13T11:52:17+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=de6387aa0bed9e01c99e58a0f3e01f617bc7fe33'/>
<id>de6387aa0bed9e01c99e58a0f3e01f617bc7fe33</id>
<content type='text'>
tcp_fill_headers() computed the TCP payload length from iov_tail_size(),
but with vhost-user multibuffer frames, the iov_tail will be larger than
the actual data.  Pass the data length explicitly so that IP total
length, pseudo-header, and checksum computations use the correct value.

Signed-off-by: Laurent Vivier &lt;lvivier@redhat.com&gt;
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Reviewed-by: Jon Maloy &lt;jmaloy@redhat.com&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
tcp_fill_headers() computed the TCP payload length from iov_tail_size(),
but with vhost-user multibuffer frames, the iov_tail will be larger than
the actual data.  Pass the data length explicitly so that IP total
length, pseudo-header, and checksum computations use the correct value.

Signed-off-by: Laurent Vivier &lt;lvivier@redhat.com&gt;
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Reviewed-by: Jon Maloy &lt;jmaloy@redhat.com&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>treewide: Make some additional variables static</title>
<updated>2026-05-11T22:04:08+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2026-05-11T10:03:21+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=97e478b40d4154cf007641f1e247cc3fef8392ad'/>
<id>97e478b40d4154cf007641f1e247cc3fef8392ad</id>
<content type='text'>
Mark a number of extra variables local to a single module as static.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Reviewed-by: Laurent Vivier &lt;lvivier@redhat.com&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Mark a number of extra variables local to a single module as static.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Reviewed-by: Laurent Vivier &lt;lvivier@redhat.com&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: Extend tcp_send_flag() to send TCP keepalive segments</title>
<updated>2026-02-24T23:17:41+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2026-02-04T11:41:36+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=a681e44ec60179567fb10f34351d7dfdbd2e7c7e'/>
<id>a681e44ec60179567fb10f34351d7dfdbd2e7c7e</id>
<content type='text'>
TCP keepalives aren't technically a flag, but they are a zero-data segment
so they can be generated with only a small modification to
tcp_{buf,vu}_send_flag().  Implement this, using a new "pseudo-flag"
value (similar to DUP_ACK), KEEPALIVE.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
[sbrivio: Fix trivial merge conflict with 812cdb802c6e]
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
TCP keepalives aren't technically a flag, but they are a zero-data segment
so they can be generated with only a small modification to
tcp_{buf,vu}_send_flag().  Implement this, using a new "pseudo-flag"
value (similar to DUP_ACK), KEEPALIVE.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
[sbrivio: Fix trivial merge conflict with 812cdb802c6e]
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: Move tap header update out of tcp_fill_headers()</title>
<updated>2026-02-15T01:52:42+00:00</updated>
<author>
<name>Laurent Vivier</name>
<email>lvivier@redhat.com</email>
</author>
<published>2026-02-10T16:08:21+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=812cdb802c6e2b217efe1756554755b784c1d9e3'/>
<id>812cdb802c6e2b217efe1756554755b784c1d9e3</id>
<content type='text'>
tcp_fill_headers() currently calls tap_hdr_update() to set the frame
length in the tap-specific header.  This is backend-specific: the tap
backend needs this for its frame length header, but the vhost-user
backend passes NULL for the tap header and doesn't use it at all.

Remove the tap_hdr parameter from tcp_fill_headers() and instead return
the computed L2 frame length.  The tap backend caller,
tcp_l2_buf_fill_headers(), now calls tap_hdr_update() itself with the
returned length.  The vhost-user callers, tcp_vu_send_flag() and
tcp_vu_prepare(), no longer need to pass a NULL tap header.

Signed-off-by: Laurent Vivier &lt;lvivier@redhat.com&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
tcp_fill_headers() currently calls tap_hdr_update() to set the frame
length in the tap-specific header.  This is backend-specific: the tap
backend needs this for its frame length header, but the vhost-user
backend passes NULL for the tap header and doesn't use it at all.

Remove the tap_hdr parameter from tcp_fill_headers() and instead return
the computed L2 frame length.  The tap backend caller,
tcp_l2_buf_fill_headers(), now calls tap_hdr_update() itself with the
returned length.  The vhost-user callers, tcp_vu_send_flag() and
tcp_vu_prepare(), no longer need to pass a NULL tap header.

Signed-off-by: Laurent Vivier &lt;lvivier@redhat.com&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: Retransmit FINs like data segments</title>
<updated>2026-01-31T02:56:50+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2026-01-30T04:41:02+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=e992b14b405ec4d0f4ea40b447029e23cbc2e30d'/>
<id>e992b14b405ec4d0f4ea40b447029e23cbc2e30d</id>
<content type='text'>
RFC 9293 doesn't distinguish between regular data segments and FIN segments
for the purposes of retransmissions.  Our existing retransmission logic
will also work for FIN segments, except for one detail: we don't currently
set the ACK_FROM_TAP_DUE flag when we send a FIN.  Add the flag, so that
we'll properly retransmit FIN segments like data segments.

Remove the section from the theory of operation comment that describes a
different way of handling FIN timeouts which (a) isn't correct behaviour
and (b) doesn't appear to be implemented.

I've tested this by adding logic to suppress sending the FIN if retries &lt;
some non-zero value.  We correctly resend the FIN and close normally after
the expected timeouts.

Link: https://bugs.passt.top/show_bug.cgi?id=195
Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
RFC 9293 doesn't distinguish between regular data segments and FIN segments
for the purposes of retransmissions.  Our existing retransmission logic
will also work for FIN segments, except for one detail: we don't currently
set the ACK_FROM_TAP_DUE flag when we send a FIN.  Add the flag, so that
we'll properly retransmit FIN segments like data segments.

Remove the section from the theory of operation comment that describes a
different way of handling FIN timeouts which (a) isn't correct behaviour
and (b) doesn't appear to be implemented.

I've tested this by adding logic to suppress sending the FIN if retries &lt;
some non-zero value.  We correctly resend the FIN and close normally after
the expected timeouts.

Link: https://bugs.passt.top/show_bug.cgi?id=195
Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp, udp: Pad batched frames to 60 bytes (802.3 minimum) in non-vhost-user modes</title>
<updated>2025-12-08T03:47:22+00:00</updated>
<author>
<name>Stefano Brivio</name>
<email>sbrivio@redhat.com</email>
</author>
<published>2025-11-03T10:16:11+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=68b0a36d6a40aece53df9d2a0a89addd07eda515'/>
<id>68b0a36d6a40aece53df9d2a0a89addd07eda515</id>
<content type='text'>
Add a further iovec frame part, TCP_IOV_ETH_PAD for TCP and
UDP_IOV_ETH_PAD for UDP, after the payload, make that point to a
zero-filled buffer, and send out a part of it if needed to reach
the minimum frame length given by 802.3, that is, 60 bytes altogether.

The frames we might need to pad are IPv4 only (the IPv6 header is
larger), and are typically TCP ACK segments but can also be small
data segments or datagrams.

Link: https://bugs.passt.top/show_bug.cgi?id=166
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add a further iovec frame part, TCP_IOV_ETH_PAD for TCP and
UDP_IOV_ETH_PAD for UDP, after the payload, make that point to a
zero-filled buffer, and send out a part of it if needed to reach
the minimum frame length given by 802.3, that is, 60 bytes altogether.

The frames we might need to pad are IPv4 only (the IPv6 header is
larger), and are typically TCP ACK segments but can also be small
data segments or datagrams.

Link: https://bugs.passt.top/show_bug.cgi?id=166
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: forward external source MAC address through tap interface</title>
<updated>2025-10-30T11:01:01+00:00</updated>
<author>
<name>Jon Maloy</name>
<email>jmaloy@redhat.com</email>
</author>
<published>2025-10-24T01:29:31+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=860a487d81e4e4240b0b41c9025bd3cbe6add0c1'/>
<id>860a487d81e4e4240b0b41c9025bd3cbe6add0c1</id>
<content type='text'>
We forward the incoming mac address through the tap interface when
receiving incoming packets from network local hosts.

This is a part of the solution to bug
https://bugs.passt.top/show_bug.cgi?id=120

Signed-off-by: Jon Maloy &lt;jmaloy@redhat.com&gt;
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We forward the incoming mac address through the tap interface when
receiving incoming packets from network local hosts.

This is a part of the solution to bug
https://bugs.passt.top/show_bug.cgi?id=120

Signed-off-by: Jon Maloy &lt;jmaloy@redhat.com&gt;
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: Fix ACK sequence on FIN to tap</title>
<updated>2025-10-07T20:22:27+00:00</updated>
<author>
<name>Stefano Brivio</name>
<email>sbrivio@redhat.com</email>
</author>
<published>2025-09-30T20:26:02+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=b3217aa5aec10704774c7142ef07daec5e698415'/>
<id>b3217aa5aec10704774c7142ef07daec5e698415</id>
<content type='text'>
If we reach end-of-file on a socket (or get EPOLLRDHUP / EPOLLHUP) and
send a FIN segment to the guest / container acknowledging a sequence
number that's behind what we received so far, we won't have any
further trigger to send an updated ACK segment, as we are now
switching the epoll socket monitoring to edge-triggered mode.

To avoid this situation, in tcp_update_seqack_wnd(), we set the next
acknowledgement sequence to the current observed sequence, regardless
of what was acknowledged socket-side.

However, we don't necessarily call tcp_update_seqack_wnd() before
sending the FIN segment, which might potentially lead to a situation,
not observed in practice, where we unnecessarily cause a
retransmission at some point after our FIN segment.

Avoid that by setting the ACK sequence to whatever we received from
the container / guest, before sending a FIN segment and switching to
EPOLLET.

Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If we reach end-of-file on a socket (or get EPOLLRDHUP / EPOLLHUP) and
send a FIN segment to the guest / container acknowledging a sequence
number that's behind what we received so far, we won't have any
further trigger to send an updated ACK segment, as we are now
switching the epoll socket monitoring to edge-triggered mode.

To avoid this situation, in tcp_update_seqack_wnd(), we set the next
acknowledgement sequence to the current observed sequence, regardless
of what was acknowledged socket-side.

However, we don't necessarily call tcp_update_seqack_wnd() before
sending the FIN segment, which might potentially lead to a situation,
not observed in practice, where we unnecessarily cause a
retransmission at some point after our FIN segment.

Avoid that by setting the ACK sequence to whatever we received from
the container / guest, before sending a FIN segment and switching to
EPOLLET.

Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
</pre>
</div>
</content>
</entry>
</feed>
