passt, branch 2023_11_07.56d9f6d

Revert "selinux: Drop user_namespace class rules for Fedora 37"

2023-11-07T13:58:02+00:00

This reverts commit 3fb3f0f7a59498bdea1d199eecfdbae6c608f78f: it was
meant as a patch for Fedora 37 (and no later versions), not something
I should have merged upstream.

Signed-off-by: Stefano Brivio

selinux: Allow passt to talk over unconfined_t UNIX domain socket for --fd

2023-11-07T11:28:27+00:00

If passt is started with --fd to talk over a pre-opened UNIX domain
socket, we don't really know what label might be associated to it,
but at least for an unconfined_t socket, this bit of policy wouldn't
belong to anywhere else: enable that here.

This is rather loose, of course, but on the other hand passt will
sandbox itself into an empty filesystem, so we're not really adding
much to the attack surface except for what --fd is supposed to do.

Reported-by: Matej Hrica 
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2247221
Signed-off-by: Stefano Brivio

log: Match implicit va_start() with va_end() in vlogmsg()

2023-11-07T11:24:27+00:00

According to C99, 7.15.1:

  Each invocation of the va_start and va_copy macros shall be matched
  by a corresponding invocation of the va_end macro in the same
  function

and the same applies to C11. I still have to come across a platform
where va_end() actually does something, but thus spake the standard.
This would be reported by Coverity as "Missing varargs init or
cleanup" (CWE-573).

Fixes: c0426ff10bc9 ("log: Add vlogmsg()")
Signed-off-by: Stefano Brivio

port_fwd: Don't try to read bound ports from invalid file handles

2023-11-07T11:24:27+00:00

This is a minimal fix for what would be reported by Coverity as
"Improper use of negative value" (CWE-394): port_fwd_init() doesn't
guarantee that all the pre-opened file handles are actually valid.

We should probably warn on failing open() and open_in_ns() in
port_fwd_init(), too, but that's outside the scope of this minimal
fix.

Before commit 5a0485425bc9 ("port_fwd: Pre-open /proc/net/* files
rather than on-demand"), we used to have a single open() call and
a check after it.

Fixes: 5a0485425bc9 ("port_fwd: Pre-open /proc/net/* files rather than on-demand")
Signed-off-by: Stefano Brivio

netlink: Sequence numbers are actually 32 bits wide

2023-11-07T11:22:13+00:00

Harmless, as we use sequence numbers monotonically anyway, but now
clang-tidy reports:

/home/sbrivio/passt/netlink.c:155:7: error: format specifies type 'unsigned short' but the argument has type '__u32' (aka 'unsigned int') [clang-diagnostic-format,-warnings-as-errors]
                    nh->nlmsg_seq, seq);
                    ^
/home/sbrivio/passt/log.h:26:7: note: expanded from macro 'die'
                err(__VA_ARGS__);                                       \
                    ^~~~~~~~~~~
/home/sbrivio/passt/log.h:19:34: note: expanded from macro 'err'
                                        ^~~~~~~~~~~
Suppressed 222820 warnings (222816 in non-user code, 4 NOLINT).
Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
1 warning treated as error
make: *** [Makefile:255: clang-tidy] Error 1

Fixes: 9d4ab98d538f ("netlink: Add nl_do() helper for simple operations with error checking")
Signed-off-by: Stefano Brivio

test/perf: Simplify calculation of "omit" time for TCP throughput

2023-11-07T08:56:24+00:00

For the TCP throughput tests, we use iperf3's -O "omit" option which
ignores results for the given time at the beginning of the test.  Currently
we calculate this as 1/6th of the test measurement time.  The purpose of
-O, however, is to skip over the TCP slow start period, which in no way
depends on the overall length of the test.

The slow start time is roughly speaking
    log_2 ( max_window_size / MSS ) * round_trip_time
These factors all vary between tests and machines we're running on, but we
can estimate some reasonable bounds for them:
  * The maximum window size is bounded by the buffer sizes at each end,
    which shouldn't exceed 16MiB
  * The mss varies with the MTU we use, but the smallest we use in tests is
    ~256 bytes
  * Round trip time will vary with the system, but with these essentially
    local transfers it will typically be well under 1ms (on my laptop it is
    closer to 0.03ms)

That gives a worst case slow start time of about 16ms.  Setting an omit
time of 0.1s uniformly is therefore more than enough, and substantially
smaller than what we calculate now for the default case (10s / 6 ~= 1.7s).

This reduces total time for the standard benchmark run by around 30s.

Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio

test/perf: Remove unnecessary --pacing-timer options

2023-11-07T08:56:21+00:00

We always set --pacing-timer when invoking iperf3.  However, the iperf3
man page implies this is only relevant for the -b option.  We only use the
-b option for the UDP tests, not TCP, so remove --pacing-timer from the TCP
cases.

Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio

test/perf: "MTU" changes in passt_tcp host to guest aren't useful

2023-11-07T08:56:18+00:00

The TCP packet size used on the passt L2 link (qemu socket) makes a huge
difference to passt/pasta throughput; many of passt's overheads (chiefly
syscalls) are per-packet.

That packet size is largely determined by the MTU on the L2 link, so we
benchmark for a number of different MTUs.  That works well for the guest to
host transfers.  For the host to guest transfers, we purport to test for
different MTUs, but we're not actually adjusting anything interesting.

The host to guest transfers adjust the MTU on the "host's" (actually ns)
loopback interface.  However, that only affects the packet size for the
socket going to passt, not the packet size for the L2 link that passt
manages - passt can and will repack the stream into packets of its own
size.  Since the depacketization on that socket is handled by the kernel it
doesn't have a lot of bearing on passt's performance.

We can't fix this by changing the L2 link MTU from the guest side (as we do
for guest to host), because that would only change the guest's view of the
MTU, passt would still think it has the large MTU.  We could test this by
using the --mtu option to passt, but that would require restarting passt
for each run, which is awkward in the current setup.  So, for now, drop all
the "small MTU" tests for host to guest.

Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio

test/perf: Explicitly control UDP packet length, instead of MTU

2023-11-07T08:56:16+00:00

Packet size can make a big difference to UDP throughput, so it makes sense
to measure it for a variety of different sizes.  Currently we do this by
adjusting the MTU on the relevant interface before running iperf3.

However, the UDP packet size has no inherent connection to the MTU - it's
controlled by the sender, and the MTU just affects whether the packet will
make it through or be fragmented.  The only reason adjusting the MTU works
is because iperf3 bases its default packet size on the (path) MTU.

We can test this more simply by using the -l option to the iperf3 client
to directly control the packet size, instead of adjusting the MTU.

As well as simplifying this lets us test different packet sizes for host to
ns traffic.  We couldn't do that previously because we don't have
permission to change the MTU on the host.

Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio

test/perf: Small MTUs for spliced TCP aren't interesting

2023-11-07T08:56:13+00:00

Currently we make TCP throughput measurements for spliced connections with
a number of different MTU values.  However, the results from this aren't
really interesting.

Unlike with tap connections, spliced connections only involve the loopback
interface on host and container, not a "real" external interface.  lo
typically has an MTU of 65535 and there is very little reason to ever
change that.  So, the measurements for smaller MTUs are rarely going to be
relevant.

In addition, the fact that we can offload all the {de,}packetization to the
kernel with splice(2) means that the throughput difference between these
MTUs isn't very great anyway.

Remove the short MTUs and only show spliced throughput for the normal
65535 byte loopback MTU.  This reduces runtime of the performance tests on
my laptop by about 1 minute (out of ~24 minutes).

Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio