passt, branch podman_25959

Debugging changes for Podman issue #25959

2025-04-29T15:47:15+00:00

Link: https://github.com/containers/podman/issues/25959
Signed-off-by: Stefano Brivio

conf: allow --fd 0

2025-04-28T12:01:17+00:00

inetd-style socket passing traditionally starts a service with a
connected socket on file descriptors 0 and 1.  passt disallowing
obtaining its socket from either of these descriptors made it
difficult to use with super-servers providing this interface — in my
case I wanted to use passt with s6-ipcserver[1].  Since (as far as I
can tell) passt does not use standard input for anything else (unlike
standard output), it should be safe to relax the restrictions on --fd
to allow setting it to 0, enabling this use case.

Link: https://skarnet.org/software/s6/s6-ipcserver.html [1]
Signed-off-by: Alyssa Ross 
Signed-off-by: Stefano Brivio

udp: Translate offender addresses for ICMP messages

2025-04-22T10:42:05+00:00

We've recently added support for propagating ICMP errors related to a UDP
flow from the host to the guest, by handling the extended UDP error on the
socket and synthesizing a suitable ICMP on the tap interface.

Currently we create that ICMP with a source address of the "offender" from
the extended error information - the source of the ICMP error received on
the host.  However, we don't translate this address for cases where we NAT
between host and guest.  This means (amongst other things) that we won't
get a "Connection refused" error as expected if send data from the guest to
the --map-host-loopback address.  The error comes from 127.0.0.1 on the
host, which doesn't make sense on the tap interface and will be discarded
by the guest.

Because ICMP errors can be sent by an intermediate host, not just by the
endpoints of the flow, we can't handle this translation purely with the
information in the flow table entry.  We need to explicitly translate this
address by our NAT rules, which we can do with the nat_inbound() helper.

Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio

udp: Rework offender address handling in udp_sock_recverr()

2025-04-22T10:42:03+00:00

Make a number of changes to udp_sock_recverr() to improve the robustness
of how we handle addresses.

 * Get the "offender" address (source of the ICMP packet) using the
   SO_EE_OFFENDER() macro, reducing assumptions about structure layout.
 * Parse the offender sockaddr using inany_from_sockaddr()
 * Check explicitly that the source and destination pifs are what we
   expect.  Previously we checked something that was probably equivalent
   in practice, but isn't strictly speaking what we require for the rest
   of the code.
 * Verify that for an ICMPv4 error we also have an IPv4 source/offender
   and destination/endpoint address
 * Verify that for an ICMPv6 error we have an IPv6 endpoint
 * Improve debug reporting of any failures

Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio

treewide: Improve robustness against sockaddrs of unexpected family

2025-04-22T10:42:00+00:00

inany_from_sockaddr() expects a socket address of family AF_INET or
AF_INET6 and ASSERT()s if it gets anything else.  In many of the callers we
can handle an unexpected family more gracefully, though, e.g. by failing
a single flow rather than killing passt.

Change inany_from_sockaddr() to return an error instead of ASSERT()ing,
and handle those errors in the callers.  Improve the reporting of any such
errors while we're at it.

With this greater robustness, allow inany_from_sockaddr() to take a void *
rather than specifically a union sockaddr_inany *.

Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio

fwd: Split out helpers for port-independent NAT

2025-04-22T10:41:47+00:00

Currently the functions fwd_nat_from_*() make some address translations
based on both the IP address and protocol port numbers, and others based
only on the address.  We have some upcoming cases where it's useful to use
the IP-address-only translations separately, so split them out into helper
functions.

Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio

udp: Propagate errors on listening and brand new sockets

2025-04-15T17:56:16+00:00

udp_sock_recverr() processes errors on UDP sockets and attempts to
propagate them as ICMP packets on the tap interface.  To do this it
currently requires the flow with which the error is associated as a
parameter.  If that's missing it will clear the error condition, but not
propagate it.

That means that we largely ignore errors on "listening" sockets.  It also
means we may discard some errors on flow specific sockets if they occur
very shortly after the socket is created.  In udp_flush_flow() we need to
clear any datagrams received between bind() and connect() which might not
be associated with the "final" flow for the socket.  If we get errors
before that point we'll ignore them in the same way because we don't know
the flow they're associated with in advance.

This can happen in practice if we have errors which occur almost
immediately after connect(), such as ECONNREFUSED when we connect() to a
local address where nothing is listening.

Between the extended error message itself and the PKTINFO information we
do actually have enough information to find the correct flow.  So, rather
than ignoring errors where we don't have a flow "hint", determine the flow
the hard way in udp_sock_recverr().

Signed-off-by: David Gibson 
[sbrivio: Change warn() to debug() in udp_sock_recverr()]
Signed-off-by: Stefano Brivio

udp: Minor re-organisation of udp_sock_recverr()

2025-04-15T17:49:06+00:00

Usually we work with the "exit early" flow style, where we return early
on "error" conditions in functions.  We don't currently do this in
udp_sock_recverr() for the case where we don't have a flow to associate
the error with.

Reorganise to use the "exit early" style, which will make some subsequent
changes less awkward.

Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio

udp: Add udp_pktinfo() helper

2025-04-15T17:48:35+00:00

Currently we open code parsing the control message for IP_PKTINFO in
udp_peek_addr().  We have an upcoming case where we want to parse PKTINFO
in another place, so split this out into a helper function.

While we're there, make the parsing a bit more robust: scan all cmsgs to
look for the one we want, rather than assuming there's only one.

Signed-off-by: David Gibson 
[sbrivio: udp_pktinfo(): Fix typo in comment and change err() to debug()]
Signed-off-by: Stefano Brivio

udp: Deal with errors as we go in udp_sock_fwd()

2025-04-15T17:45:19+00:00

When we get an epoll event on a listening socket, we first deal with any
errors (udp_sock_errs()), then with any received packets (udp_sock_fwd()).
However, it's theoretically possible that new errors could get flagged on
the socket after we call udp_sock_errs(), in which case we could get errors
returned in in udp_sock_fwd() -> udp_peek_addr() -> recvmsg().

In fact, we do deal with this correctly, although the path is somewhat
non-obvious.  The recvmsg() error will cause us to bail out of
udp_sock_fwd(), but the EPOLLERR event will now be flagged, so we'll come
back here next epoll loop and call udp_sock_errs().

Except.. we call udp_sock_fwd() from udp_flush_flow() as well as from
epoll events.  This is to deal with any packets that arrived between bind()
and connect(), and so might not be associated with the socket's intended
flow.  This expects udp_sock_fwd() to flush _all_ queued datagrams, so that
anything received later must be for the correct flow.

At the moment, udp_sock_errs() might fail to flush all datagrams if errors
occur.  In particular this can happen in practice for locally reported
errors which occur immediately after connect() (e.g. connecting to a local
port with nothing listening).

We can deal with the problem case, and also make the flow a little more
natural for the common case by having udp_sock_fwd() call udp_sock_errs()
to handle errors as the occur, rather than trying to deal with all errors
in advance.

Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio