<feed xmlns='http://www.w3.org/2005/Atom'>
<title>passt/util.c, branch 2026_05_07.1afd4ed</title>
<subtitle>Plug A Simple Socket Transport</subtitle>
<link rel='alternate' type='text/html' href='https://passt.top/passt/'/>
<entry>
<title>bitmap: Split bitmap helper functions into their own module</title>
<updated>2026-03-28T13:36:02+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2026-03-27T04:34:29+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=51eaaa7be714229c1b2e2f7ba5f3c5eae36c8a61'/>
<id>51eaaa7be714229c1b2e2f7ba5f3c5eae36c8a61</id>
<content type='text'>
Currently bitmap functions are in util.[ch] along with a lot of other
stuff.  In preparation for sharing them with a configuration client, move
these out into their own files.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently bitmap functions are in util.[ch] along with a lot of other
stuff.  In preparation for sharing them with a configuration client, move
these out into their own files.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>serialise: Split functions user for serialisation from util.c</title>
<updated>2026-03-28T13:35:41+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2026-03-27T04:34:16+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=8081aa597be4007f3ef15fe8ce11ec8ad0ab8b42'/>
<id>8081aa597be4007f3ef15fe8ce11ec8ad0ab8b42</id>
<content type='text'>
The read_all_buf() and write_all_buf() functions in util.c are
primarily used for serialising data structures to a stream during
migraiton.  We're going to have further use for such serialisation
when we add dynamic configuration updates, where we'll want to share
the code with the client program.

To make that easier move the functions into a new serialise.c
file, and rename thematically.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The read_all_buf() and write_all_buf() functions in util.c are
primarily used for serialising data structures to a stream during
migraiton.  We're going to have further use for such serialisation
when we add dynamic configuration updates, where we'll want to share
the code with the client program.

To make that easier move the functions into a new serialise.c
file, and rename thematically.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>treewide: Spell ASSERT() as assert()</title>
<updated>2026-03-20T20:05:29+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2026-03-19T06:11:43+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=bc872d91765dfd6ff34b0e9a34bce410fac1cef3'/>
<id>bc872d91765dfd6ff34b0e9a34bce410fac1cef3</id>
<content type='text'>
The standard library assert(3), at least with glibc, hits our seccomp
filter and dies with SIGSYS before it's able to print a message, making it
near useless.  Therefore, since 7a8ed9459dfe ("Make assertions actually
useful") we've instead used our own implementation, named ASSERT().

This makes our code look slightly odd though - ASSERT() has the same
overall effect as assert(), it's just a different implementation.  More
importantly this makes it awkward to share code between passt/pasta proper
and things that compile in a more typical environment.  We're going to want
that for our upcoming dynamic configuration tool.

Address this by overriding the standard library's assert() implementation
with our own, instead of giving ours its own name.

The standard assert() is supposed to be omitted if NDEBUG is defined,
which ours doesn't do.  Implement that as well, so ours doesn't
unexpectedly differ.  For the -DNDEBUG case we do this by *not* overriding
assert(), since it will be a no-op anyway.  This requires a few places to
add a #include &lt;assert.h&gt; to let us compile (albeit with warnings) when
-DNDEBUG.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
[sbrivio: Fix some conflicts and missing conversions as a result of
 applying "vu_common: Move iovec management into vu_collect()" first]
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The standard library assert(3), at least with glibc, hits our seccomp
filter and dies with SIGSYS before it's able to print a message, making it
near useless.  Therefore, since 7a8ed9459dfe ("Make assertions actually
useful") we've instead used our own implementation, named ASSERT().

This makes our code look slightly odd though - ASSERT() has the same
overall effect as assert(), it's just a different implementation.  More
importantly this makes it awkward to share code between passt/pasta proper
and things that compile in a more typical environment.  We're going to want
that for our upcoming dynamic configuration tool.

Address this by overriding the standard library's assert() implementation
with our own, instead of giving ours its own name.

The standard assert() is supposed to be omitted if NDEBUG is defined,
which ours doesn't do.  Implement that as well, so ours doesn't
unexpectedly differ.  For the -DNDEBUG case we do this by *not* overriding
assert(), since it will be a no-op anyway.  This requires a few places to
add a #include &lt;assert.h&gt; to let us compile (albeit with warnings) when
-DNDEBUG.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
[sbrivio: Fix some conflicts and missing conversions as a result of
 applying "vu_common: Move iovec management into vu_collect()" first]
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Add missing includes to headers</title>
<updated>2026-03-04T16:39:57+00:00</updated>
<author>
<name>Peter Foley</name>
<email>pefoley@google.com</email>
</author>
<published>2026-02-23T18:11:19+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=adbf5c135f19db5b6751393b7f5cbf516031bde8'/>
<id>adbf5c135f19db5b6751393b7f5cbf516031bde8</id>
<content type='text'>
Support build systems like bazel that check that headers are
self-contained.

Also update includes so that clang-include-cleaner succeeds.

Tested with:
clang-include-cleaner-19 --extra-arg=-D_GNU_SOURCE --extra-arg=-DPAGE_SIZE=4096 --extra-arg=-DVERSION=\"git\" --extra-arg=-DHAS_GETRANDOM *.h *.c

Signed-off-by: Peter Foley &lt;pefoley@google.com&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Support build systems like bazel that check that headers are
self-contained.

Also update includes so that clang-include-cleaner succeeds.

Tested with:
clang-include-cleaner-19 --extra-arg=-D_GNU_SOURCE --extra-arg=-DPAGE_SIZE=4096 --extra-arg=-DVERSION=\"git\" --extra-arg=-DHAS_GETRANDOM *.h *.c

Signed-off-by: Peter Foley &lt;pefoley@google.com&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Fix build when HAS_GETRANDOM is undefined</title>
<updated>2026-02-24T11:09:48+00:00</updated>
<author>
<name>Peter Foley</name>
<email>pefoley@google.com</email>
</author>
<published>2026-02-23T17:27:39+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=66e59418a88751ee5309bf9dfc9ce7d34e211b4e'/>
<id>66e59418a88751ee5309bf9dfc9ce7d34e211b4e</id>
<content type='text'>
e.g.
util.c:1163:14: error: use of undeclared identifier 'dev_random'; did you mean 'raw_random'?
 1163 |                 ret = read(dev_random, (char *)buf + random_read,

Fixes: 71d5deed5eed ("util: Add general low-level random bytes helper")
Signed-off-by: Peter Foley &lt;pefoley@google.com&gt;
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Reviewed-by: Laurent Vivier &lt;lvivier@redhat.com&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
e.g.
util.c:1163:14: error: use of undeclared identifier 'dev_random'; did you mean 'raw_random'?
 1163 |                 ret = read(dev_random, (char *)buf + random_read,

Fixes: 71d5deed5eed ("util: Add general low-level random bytes helper")
Signed-off-by: Peter Foley &lt;pefoley@google.com&gt;
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Reviewed-by: Laurent Vivier &lt;lvivier@redhat.com&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>util: Be more defensive about buffer overruns in read_file()</title>
<updated>2026-01-10T18:27:40+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2026-01-07T01:46:02+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=0bd2e6883d560c038e49a3cf68984d63f87ca67a'/>
<id>0bd2e6883d560c038e49a3cf68984d63f87ca67a</id>
<content type='text'>
clang-21.1.7 complains about read_file(), thinking that total_read might
come to exceed buf_size, leading to an out of bounds access at the end of
the function.  In fact, the semantics of read()'s return mean this can't
ever happen.  But we already have to check for the total_read == buf_size
case, so it's basically free to change it to &gt;= and suppress the error.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Reviewed-by: Laurent Vivier &lt;lvivier@redhat.com&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
clang-21.1.7 complains about read_file(), thinking that total_read might
come to exceed buf_size, leading to an out of bounds access at the end of
the function.  In fact, the semantics of read()'s return mean this can't
ever happen.  But we already have to check for the total_read == buf_size
case, so it's basically free to change it to &gt;= and suppress the error.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Reviewed-by: Laurent Vivier &lt;lvivier@redhat.com&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>pasta: Clean up waiting pasta child on failures</title>
<updated>2025-12-12T21:23:14+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2025-12-11T03:54:36+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=87f1a917d881d0881d6db5fdc2345f345a0e30d1'/>
<id>87f1a917d881d0881d6db5fdc2345f345a0e30d1</id>
<content type='text'>
When pasta is invoked with a command rather than an existing namespace to
attach to, it spawns a child process to run a shell or other command.  We
create that process during conf(), since we need the namespace to exist for
much of our setup.  However, we don't want the specified command to run
until the pasta network interface is ready for use.  Therefore,
pasta_spawn_cmd() executing in the child waits before exec()ing.  main()
signals the child to continue with SIGUSR1 shortly before entering the
main forwarding loop.

This has the downside that if we exit due to any kind of failure between
conf() and the SIGUSR1, the child process will be around waiting
indefinitely.  The user must manually clean this up.

Make this cleaner, by having the child use PR_SET_PDEATHSIG to have
itself killed if the parent dies during this window.  Technically
speaking this is racy: if the parent dies before the child can call
the prctl() it will be left zombie-like as before.  However, as long
as the parent completes pasta_wait_for_ns() before dying, I wasn't
able to trigger the race.  Since the consequences of this going wrong
are merely a bit ugly, I think that's good enough.

Suggested-by: Paul Holzinger &lt;pholzing@redhat.com&gt;
Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Reviewed-by: Paul Holzinger &lt;pholzing@redhat.com&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When pasta is invoked with a command rather than an existing namespace to
attach to, it spawns a child process to run a shell or other command.  We
create that process during conf(), since we need the namespace to exist for
much of our setup.  However, we don't want the specified command to run
until the pasta network interface is ready for use.  Therefore,
pasta_spawn_cmd() executing in the child waits before exec()ing.  main()
signals the child to continue with SIGUSR1 shortly before entering the
main forwarding loop.

This has the downside that if we exit due to any kind of failure between
conf() and the SIGUSR1, the child process will be around waiting
indefinitely.  The user must manually clean this up.

Make this cleaner, by having the child use PR_SET_PDEATHSIG to have
itself killed if the parent dies during this window.  Technically
speaking this is racy: if the parent dies before the child can call
the prctl() it will be left zombie-like as before.  However, as long
as the parent completes pasta_wait_for_ns() before dying, I wasn't
able to trigger the race.  Since the consequences of this going wrong
are merely a bit ugly, I think that's good enough.

Suggested-by: Paul Holzinger &lt;pholzing@redhat.com&gt;
Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Reviewed-by: Paul Holzinger &lt;pholzing@redhat.com&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>treewide: Introduce passt_exit() helper</title>
<updated>2025-12-12T21:20:02+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2025-12-11T03:54:35+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=e6612fe0a7cf4860b0d81d3b886f95273d979d1d'/>
<id>e6612fe0a7cf4860b0d81d3b886f95273d979d1d</id>
<content type='text'>
In d0006fa78 ("treewide: use _exit() over exit()"), we replaced use of
the normal exit(3) with direct calls to _exit(2).  That was because glibc
exit(3) made some unexpected futex() calls, which hit our seccomp profile.

We've since had some bugs due to missing the extra cleanup that exit(3)
implemented, for which we've added explicit cleanup calls.  Specifically,
we have fflush() calls in some places to avoid leaving incomplete messages
on stdout/stderr, and in other places fsync_pcap_and_log() to avoid
leaving incomplete log or pcap files.

It's easy to forget these when adding new error paths, so instead,
implement our own passt_exit() wrapper to perform vital cleanup then call
_exit(2).  This also provides an obvious place to add any additional
cleanups we discover we need in future.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In d0006fa78 ("treewide: use _exit() over exit()"), we replaced use of
the normal exit(3) with direct calls to _exit(2).  That was because glibc
exit(3) made some unexpected futex() calls, which hit our seccomp profile.

We've since had some bugs due to missing the extra cleanup that exit(3)
implemented, for which we've added explicit cleanup calls.  Specifically,
we have fflush() calls in some places to avoid leaving incomplete messages
on stdout/stderr, and in other places fsync_pcap_and_log() to avoid
leaving incomplete log or pcap files.

It's easy to forget these when adding new error paths, so instead,
implement our own passt_exit() wrapper to perform vital cleanup then call
_exit(2).  This also provides an obvious place to add any additional
cleanups we discover we need in future.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: Adaptive interval based on RTT for socket-side acknowledgement checks</title>
<updated>2025-12-08T08:15:36+00:00</updated>
<author>
<name>Stefano Brivio</name>
<email>sbrivio@redhat.com</email>
</author>
<published>2025-12-03T19:04:21+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=000601ba86da0d876fc91e0813a1e752540666f1'/>
<id>000601ba86da0d876fc91e0813a1e752540666f1</id>
<content type='text'>
A fixed 10 ms ACK_INTERVAL timer value served us relatively well until
the previous change, because we would generally cause retransmissions
for non-local outbound transfers with relatively high (&gt; 100 Mbps)
bandwidth and non-local but low (&lt; 5 ms) RTT.

Now that retransmissions are less frequent, we don't have a proper
trigger to check for acknowledged bytes on the socket, and will
generally block the sender for a significant amount of time while
we could acknowledge more data, instead.

Store the RTT reported by the kernel using an approximation (exponent),
to keep flow storage size within two (typical) cachelines. Check for
socket updates when half of this time elapses: it should be a good
indication of the one-way delay we're interested in (peer to us).

Representable values are between 100 us and 3.2768 s, and any value
outside this range is clamped to these bounds. This choice appears
to be a good trade-off between additional overhead and throughput.

This mechanism partially overlaps with the "low RTT" destinations,
which we use to infer that a socket is connected to an endpoint to
the same machine (while possibly in a different namespace) if the
RTT is reported as 10 us or less.

This change doesn't, however, conflict with it: we are reading
TCP_INFO parameters for local connections anyway, so we can always
store the RTT approximation opportunistically.

Then, if the RTT is "low", we don't really need a timer to
acknowledge data as we'll always acknowledge everything to the
sender right away. However, we have limited space in the array where
we store addresses of local destination, so the low RTT property of a
connection might toggle frequently. Because of this, it's actually
helpful to always have the RTT approximation stored.

This could probably benefit from a future rework, though, introducing
a more integrated approach between these two mechanisms.

Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A fixed 10 ms ACK_INTERVAL timer value served us relatively well until
the previous change, because we would generally cause retransmissions
for non-local outbound transfers with relatively high (&gt; 100 Mbps)
bandwidth and non-local but low (&lt; 5 ms) RTT.

Now that retransmissions are less frequent, we don't have a proper
trigger to check for acknowledged bytes on the socket, and will
generally block the sender for a significant amount of time while
we could acknowledge more data, instead.

Store the RTT reported by the kernel using an approximation (exponent),
to keep flow storage size within two (typical) cachelines. Check for
socket updates when half of this time elapses: it should be a good
indication of the one-way delay we're interested in (peer to us).

Representable values are between 100 us and 3.2768 s, and any value
outside this range is clamped to these bounds. This choice appears
to be a good trade-off between additional overhead and throughput.

This mechanism partially overlaps with the "low RTT" destinations,
which we use to infer that a socket is connected to an endpoint to
the same machine (while possibly in a different namespace) if the
RTT is reported as 10 us or less.

This change doesn't, however, conflict with it: we are reading
TCP_INFO parameters for local connections anyway, so we can always
store the RTT approximation opportunistically.

Then, if the RTT is "low", we don't really need a timer to
acknowledge data as we'll always acknowledge everything to the
sender right away. However, we have limited space in the array where
we store addresses of local destination, so the low RTT property of a
connection might toggle frequently. Because of this, it's actually
helpful to always have the RTT approximation stored.

This could probably benefit from a future rework, though, introducing
a more integrated approach between these two mechanisms.

Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp, util: Add function for scaling to linearly interpolated factor, use it</title>
<updated>2025-12-08T07:00:09+00:00</updated>
<author>
<name>Stefano Brivio</name>
<email>sbrivio@redhat.com</email>
</author>
<published>2025-12-07T19:56:53+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=6305b6ceec518c4f8cca4a8e9cc7624e376f84c9'/>
<id>6305b6ceec518c4f8cca4a8e9cc7624e376f84c9</id>
<content type='text'>
Right now, the only need for this kind of function comes from
tcp_get_sndbuf(), which calculates the amount of sending buffer we
want to use depending on its own size: we want to use more of it
if it's smaller, as bookkeeping overhead is usually lower and we rely
on auto-tuning there, and use less of it when it's bigger.

For this purpose, the new function is overly generic: @x is the same
as @y, that is, we want to use more or less of the buffer depending
on the size of the buffer itself.

However, an upcoming change will need that generality, as we'll want
to scale the amount of sending buffer we use depending on another
(scaled) factor.

Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Right now, the only need for this kind of function comes from
tcp_get_sndbuf(), which calculates the amount of sending buffer we
want to use depending on its own size: we want to use more of it
if it's smaller, as bookkeeping overhead is usually lower and we rely
on auto-tuning there, and use less of it when it's bigger.

For this purpose, the new function is overly generic: @x is the same
as @y, that is, we want to use more or less of the buffer depending
on the size of the buffer itself.

However, an upcoming change will need that generality, as we'll want
to scale the amount of sending buffer we use depending on another
(scaled) factor.

Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
