passt, branch 2025_04_15.2340bbf

udp: Propagate errors on listening and brand new sockets

2025-04-15T17:56:16+00:00

udp_sock_recverr() processes errors on UDP sockets and attempts to
propagate them as ICMP packets on the tap interface.  To do this it
currently requires the flow with which the error is associated as a
parameter.  If that's missing it will clear the error condition, but not
propagate it.

That means that we largely ignore errors on "listening" sockets.  It also
means we may discard some errors on flow specific sockets if they occur
very shortly after the socket is created.  In udp_flush_flow() we need to
clear any datagrams received between bind() and connect() which might not
be associated with the "final" flow for the socket.  If we get errors
before that point we'll ignore them in the same way because we don't know
the flow they're associated with in advance.

This can happen in practice if we have errors which occur almost
immediately after connect(), such as ECONNREFUSED when we connect() to a
local address where nothing is listening.

Between the extended error message itself and the PKTINFO information we
do actually have enough information to find the correct flow.  So, rather
than ignoring errors where we don't have a flow "hint", determine the flow
the hard way in udp_sock_recverr().

Signed-off-by: David Gibson 
[sbrivio: Change warn() to debug() in udp_sock_recverr()]
Signed-off-by: Stefano Brivio

udp: Minor re-organisation of udp_sock_recverr()

2025-04-15T17:49:06+00:00

Usually we work with the "exit early" flow style, where we return early
on "error" conditions in functions.  We don't currently do this in
udp_sock_recverr() for the case where we don't have a flow to associate
the error with.

Reorganise to use the "exit early" style, which will make some subsequent
changes less awkward.

Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio

udp: Add udp_pktinfo() helper

2025-04-15T17:48:35+00:00

Currently we open code parsing the control message for IP_PKTINFO in
udp_peek_addr().  We have an upcoming case where we want to parse PKTINFO
in another place, so split this out into a helper function.

While we're there, make the parsing a bit more robust: scan all cmsgs to
look for the one we want, rather than assuming there's only one.

Signed-off-by: David Gibson 
[sbrivio: udp_pktinfo(): Fix typo in comment and change err() to debug()]
Signed-off-by: Stefano Brivio

udp: Deal with errors as we go in udp_sock_fwd()

2025-04-15T17:45:19+00:00

When we get an epoll event on a listening socket, we first deal with any
errors (udp_sock_errs()), then with any received packets (udp_sock_fwd()).
However, it's theoretically possible that new errors could get flagged on
the socket after we call udp_sock_errs(), in which case we could get errors
returned in in udp_sock_fwd() -> udp_peek_addr() -> recvmsg().

In fact, we do deal with this correctly, although the path is somewhat
non-obvious.  The recvmsg() error will cause us to bail out of
udp_sock_fwd(), but the EPOLLERR event will now be flagged, so we'll come
back here next epoll loop and call udp_sock_errs().

Except.. we call udp_sock_fwd() from udp_flush_flow() as well as from
epoll events.  This is to deal with any packets that arrived between bind()
and connect(), and so might not be associated with the socket's intended
flow.  This expects udp_sock_fwd() to flush _all_ queued datagrams, so that
anything received later must be for the correct flow.

At the moment, udp_sock_errs() might fail to flush all datagrams if errors
occur.  In particular this can happen in practice for locally reported
errors which occur immediately after connect() (e.g. connecting to a local
port with nothing listening).

We can deal with the problem case, and also make the flow a little more
natural for the common case by having udp_sock_fwd() call udp_sock_errs()
to handle errors as the occur, rather than trying to deal with all errors
in advance.

Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio

udp: Pass socket & flow information direction to error handling functions

2025-04-15T17:45:09+00:00

udp_sock_recverr() and udp_sock_errs() take an epoll reference from which
they obtain both the socket fd to receive errors from, and - for flow
specific sockets - the flow and side the socket is associated with.

We have some upcoming cases where we want to clear errors when we're not
directly associated with receiving an epoll event, so it's not natural to
have an epoll reference.  Therefore, make these functions take the socket
and flow from explicit parameters.

Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio

udp: Be quieter about errors on UDP receive

2025-04-15T17:43:56+00:00

If we get an error on UDP receive, either in udp_peek_addr() or
udp_sock_recv(), we'll print an error message.  However, this could be
a perfectly routine UDP error triggered by an ICMP, which need not go to
the error log.

This doesn't usually happen, because before receiving we typically clear
the error queue from udp_sock_errs().  However, it's possible an error
could be flagged after udp_sock_errs() but before we receive.  So it's
better to handle this error "silently" (trace level only).  We'll bail out
of the receive, return to the epoll loop, and get an EPOLLERR where we'll
handle and report the error properly.

In particular there's one situation that can trigger this case much more
easily.  If we start a new outbound UDP flow to a local destination with
nothing listening, we'll get a more or less immediate connection refused
error.  So, we'll get that error on the very first receive after the
connect().  That will occur in udp_flow_defer() -> udp_flush_flow() ->
udp_sock_fwd() -> udp_peek_addr() -> recvmsg().  This path doesn't call
udp_sock_errs() first, so isn't (imperfectly) protected the way we are
most of the time.

Fixes: 84ab1305faba ("udp: Polish udp_vu_sock_info() and remove from vu specific code")
Fixes: 69e5393c3722 ("udp: Move some more of sock_handler tasks into sub-functions")
Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio

udp: Fix breakage of UDP error handling by PKTINFO support

2025-04-15T17:43:00+00:00

We recently enabled the IP_PKTINFO / IPV6_RECVPKTINFO socket options on our
UDP sockets.  This lets us obtain and properly handle the specific local
address used when we're "listening" with a socket on 0.0.0.0 or ::.

However, the PKTINFO cmsgs this option generates appear on error queue
messages as well as regular datagrams.  udp_sock_recverr() doesn't expect
this and so flags an unrecoverable error when it can't parse the control
message.

Correct this by adding space in udp_sock_recverr()s control buffer for the
additional PKTINFO data, and scan through all cmsgs for the RECVERR, rather
than only looking at the first one.

Link: https://bugs.passt.top/show_bug.cgi?id=99
Fixes: f4b0dd8b0685 ("udp: Use PKTINFO cmsgs to get destination address for received datagrams")
Reported-by: Stefano Brivio 
Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio

conf: Honour --dns-forward for local resolver even with --no-map-gw

2025-04-15T17:42:59+00:00

If the first resolver listed in the host's /etc/resolv.conf is a
loopback address, and --no-map-gw is given, we automatically conclude
that the resolver is not reachable, discard it, and, if it's the only
nameserver listed in /etc/resolv.conf, we'll warn that we:

  Couldn't get any nameserver address

However, this isn't true in a general case: the user might have passed
--dns-forward, and in that case, while we won't map the address of the
default gateway to the host, we're still supposed to map that
particular address. Otherwise, in this common Podman usage:

  pasta --config-net --dns-forward 169.254.1.1 -t none -u none -T none -U none --no-map-gw --netns /run/user/1000/netns/netns-c02a8d8f-6ee3-902e-33c5-317e0f24e0af --map-guest-addr 169.254.1.2

and with a loopback address in /etc/resolv.conf, we'll unexpectedly
refuse to forward DNS queries:

  # nslookup passt.top 169.254.1.1
  ;; connection timed out; no servers could be reached

To fix this, make an exception for --dns-forward: if &c->ip4.dns_match
or &c->ip6.dns_match are set in add_dns_resolv4() / add_dns_resolv6(),
use that address as guest-facing resolver.

We already set 'dns_host' to the address we found in /etc/resolv.conf,
that's correct in this case and it makes us forward queries as
expected.

I'm not changing the man page as the current description of
--dns-forward is already consistent with the new behaviour: there's no
described way in which --no-map-gw should affect it.

Reported-by: Andrew Sayers 
Link: https://bugs.passt.top/show_bug.cgi?id=111
Suggested-by: Paul Holzinger 
Signed-off-by: Stefano Brivio 
Reviewed-by: Paul Holzinger 
Reviewed-by: David Gibson

conf: Split add_dns_resolv() into separate IPv4 and IPv6 versions

2025-04-15T17:42:57+00:00

Not really valuable by itself, but dropping one level of nested blocks
makes the next change more convenient.

No functional changes intended.

Signed-off-by: Stefano Brivio 
Reviewed-by: Paul Holzinger 
Reviewed-by: David Gibson

udp, udp_flow: Track our specific address on socket interfaces

2025-04-10T17:46:16+00:00

So far for UDP flows (like TCP connections) we didn't record our address
(oaddr) in the flow table entry for socket based pifs.  That's because we
didn't have that information when a flow was initiated by a datagram coming
to a "listening" socket with 0.0.0.0 or :: address.  Even when we did have
the information, we didn't record it, to simplify address matching on
lookups.

This meant that in some circumstances we could send replies on a UDP flow
from a different address than the originating request came to, which is
surprising and breaks certain setups.

We now have code in udp_peek_addr() which does determine our address for
incoming UDP datagrams.  We can use that information to properly populate
oaddr in the flow table for flow initiated from a socket.

In order to be able to consistently match datagrams to flows, we must
*always* have a specific oaddr, not an unspecified address (that's how the
flow hash table works).  So, we also need to fill in oaddr correctly for
flows we initiate *to* sockets.  Our forwarding logic doesn't specify
oaddr here, letting the kernel decide based on the routing table.  In this
case we need to call getsockname() after connect()ing the socket to find
which local address the kernel picked.

This adds getsockname() to our seccomp profile for all variants.

Link: https://bugs.passt.top/show_bug.cgi?id=99
Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio