passt, branch 2024_09_06.6b38f07

apparmor: Allow read access to /proc/sys/net/ipv4/ip_local_port_range

2024-09-06T13:34:06+00:00

...for both passt and pasta: use passt's abstraction for this.

Fixes: eedc81b6ef55 ("fwd, conf: Probe host's ephemeral ports")
Signed-off-by: Stefano Brivio

selinux: Allow read access to /proc/sys/net/ipv4/ip_local_port_range

2024-09-06T13:34:06+00:00

Since commit eedc81b6ef55 ("fwd, conf: Probe host's ephemeral ports"),
we might need to read from /proc/sys/net/ipv4/ip_local_port_range in
both passt and pasta.

While pasta was already allowed to open and write /proc/sys/net
entries, read access was missing in SELinux's type enforcement: add
that.

In passt, instead, this is the first time we need to access an entry
there: add everything we need.

Fixes: eedc81b6ef55 ("fwd, conf: Probe host's ephemeral ports")
Signed-off-by: Stefano Brivio

tap: Don't risk truncating frames on full buffer in tap_pasta_input()

2024-09-06T11:56:46+00:00

tap_pasta_input() keeps reading frames from the tap device until the
buffer is full.  However, this has an ugly edge case, when we get close
to buffer full, we will provide just the remaining space as a read()
buffer.  If this is shorter than the next frame to read, the tap device
will truncate the frame and discard the remainder.

Adjust the code to make sure we always have room for a maximum size frame.

Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio

tap: Restructure in tap_pasta_input()

2024-09-06T11:56:43+00:00

tap_pasta_input() has a rather confusing structure, using two gotos.
Remove these by restructuring the function to have the main loop condition
based on filling our buffer space, with errors or running out of data
treated as the exception, rather than the other way around.  This allows
us to handle the EINTR which triggered the 'restart' goto with a continue.

The outer 'redo' was triggered if we completely filled our buffer, to flush
it and do another pass.  This one is unnecessary since we don't (yet) use
EPOLLET on the tap device: if there's still more data we'll get another
event and re-enter the loop.

Along the way handle a couple of extra edge cases:
 - Check for EWOULDBLOCK as well as EAGAIN for the benefit of any future
   ports where those might not have the same value
 - Detect EOF on the tap device and exit in that case

Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio

tap: Improve handling of EINTR in tap_passt_input()

2024-09-06T11:56:41+00:00

When tap_passt_input() gets an error from recv() it (correctly) does not
print any error message for EINTR, EAGAIN or EWOULDBLOCK.  However in all
three cases it returns from the function.  That makes sense for EAGAIN and
EWOULDBLOCK, since we then want to wait for the next EPOLLIN event before
trying again.  For EINTR, however, it makes more sense to retry immediately
- as it stands we're likely to get a renewer EPOLLIN event immediately in
that case, since we're using level triggered signalling.

So, handle EINTR separately by immediately retrying until we succeed or
get a different type of error.

Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio

tap: Split out handling of EPOLLIN events

2024-09-06T11:56:37+00:00

Currently, tap_handler_pas{st,ta}() check for EPOLLRDHUP, EPOLLHUP and
EPOLLERR events, then assume anything left is EPOLLIN.  We have some future
cases that may want to also handle EPOLLOUT, so in preparation explicitly
handle EPOLLIN, moving the logic to a subfunction.

Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio

util: Fix order of operands and carry of one second in timespec_diff_us()

2024-09-06T11:01:34+00:00

If the nanoseconds of the minuend timestamp are less than the
nanoseconds of the subtrahend timestamp, we need to carry one second
in the subtraction.

I subtracted this second from the minuend, but didn't actually carry
it in the subtraction of nanoseconds, and logged timestamps would jump
back whenever we switched to the first branch of timespec_diff_us()
from the second one.

Most likely, the reason why I didn't carry the second is that I
instinctively thought that swapping the operands would have the same
effect. But it doesn't, in general: that only happens with arithmetic
in modulo powers of 2. Undo the swap as well.

Signed-off-by: Stefano Brivio 
Reviewed-by: David Gibson

cppcheck: Work around some cppcheck 2.15.0 redundantInitialization warnings

2024-09-06T10:54:20+00:00

cppcheck-2.15.0 has apparently broadened when it throws a warning about
redundant initialization to include some cases where we have an initializer
for some fields, but then set other fields in the function body.

This is arguably a false positive: although we are technically overwriting
the zero-initialization the compiler supplies for fields not explicitly
initialized, this sort of construct makes sense when there are some fields
we know at the top of the function where the initializer is, but others
that require more complex calculation.

That said, in the two places this shows up, it's pretty easy to work
around.  The results are arguably slightly clearer than what we had, since
they move the parts of the initialization closer together.

So do that rather than having ugly suppressions or dealing with the
tedious process of reporting a cppcheck false positive.

Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio

tcp: Use EPOLLET for any state of not established connections

2024-09-06T10:54:16+00:00

Currently, for not established connections, we monitor sockets with
edge-triggered events (EPOLLET) if we are in the TAP_SYN_RCVD state
(outbound connection being established) but not in the
TAP_SYN_ACK_SENT case of it (socket is connected, and we sent SYN,ACK
to the container/guest).

While debugging https://bugs.passt.top/show_bug.cgi?id=94, I spotted
another possibility for a short EPOLLRDHUP storm (10 seconds), which
doesn't seem to happen in actual use cases, but I could reproduce it:
start a connection from a container, while dropping (using netfilter)
ACK segments coming out of the container itself.

On the server side, outside the container, accept the connection and
shutdown the writing side of it immediately.

At this point, we're in the TAP_SYN_ACK_SENT case (not just a mere
TAP_SYN_RCVD state), we get EPOLLRDHUP from the socket, but we don't
have any reasonable way to handle it other than waiting for the tap
side to complete the three-way handshake. So we'll just keep getting
this EPOLLRDHUP until the SYN_TIMEOUT kicks in.

Always enable EPOLLET when EPOLLRDHUP is the only epoll event we
subscribe to: in this case, getting multiple EPOLLRDHUP reports is
totally useless.

In the only remaining non-established state, SOCK_ACCEPTED, for
inbound connections, we're anyway discarding EPOLLRDHUP events until
we established the conection, because we don't know what to do with
them until we get an answer from the tap side, so it's safe to enable
EPOLLET also in that case.

Link: https://bugs.passt.top/show_bug.cgi?id=94
Signed-off-by: Stefano Brivio

udp: Handle more error conditions in udp_sock_errs()

2024-09-06T10:53:38+00:00

udp_sock_errs() reads out everything in the socket error queue.  However
we've seen some cases[0] where an EPOLLERR event is active, but there isn't
anything in the queue.

One possibility is that the error is reported instead by the SO_ERROR
sockopt.  Check for that case and report it as best we can.  If we still
get an EPOLLERR without visible error, we have no way to clear the error
state, so treat it as an unrecoverable error.

[0] https://github.com/containers/podman/issues/23686#issuecomment-2324945010

Link: https://bugs.passt.top/show_bug.cgi?id=95
Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio