passt - Plug A Simple Socket Transport

	Commit message (Collapse)	Author	Age	Files	Lines
*	fwd: Direct inbound spliced forwards to the guest's external address	David Gibson	11 days	1	-8/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In pasta mode, where addressing permits we "splice" connections, forwarding directly from host socket to guest/container socket without any L2 or L3 processing. This gives us a very large performance improvement when it's possible. Since the traffic is from a local socket within the guest, it will go over the guest's 'lo' interface, and accordingly we set the guest side address to be the loopback address. However this has a surprising side effect: sometimes guests will run services that are only supposed to be used within the guest and are therefore bound to only 127.0.0.1 and/or ::1. pasta's forwarding exposes those services to the host, which isn't generally what we want. Correct this by instead forwarding inbound "splice" flows to the guest's external address. Link: https://github.com/containers/podman/issues/24045 Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	fwd, conf: Probe host's ephemeral ports	David Gibson	2024-08-29	1	-2/+59
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we forward "all" ports (-t all or -u all), or use an exclude-only range, we don't actually forward all ports - that wouln't leave local ports to use for outgoing connections. Rather we forward all non-ephemeral ports - those that won't be used for outgoing connections or datagrams. Currently we assume the range of ephemeral ports is that recommended by RFC 6335, 49152-65535. However, that's not the range used by default on Linux, 32768-60999 but configurable with the net.ipv4.ip_local_port_range sysctl. We can't really know what range the guest will consider ephemeral, but if it differs too much from the host it's likely to cause problems we can't avoid anyway. So, using the host's ephemeral range is a better guess than using the RFC 6335 range. Therefore, add logic to probe the host's ephemeral range, falling back to the RFC 6335 range if that fails. This has the bonus advantage of reducing the number of ports bound by -t all -u all on most Linux machines thereby reducing kernel memory usage. Specifically this reduces kernel memory usage with -t all -u all from ~380MiB to ~289MiB. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	conf, fwd: Make ephemeral port logic more flexible	David Gibson	2024-08-29	1	-0/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	"Ephemeral" ports are those which the kernel may allocate as local port numbers for outgoing connections or datagrams. Because of that, they're generally not good choices for listening servers to bind to. Thefore when using -t all, -u all or exclude-only ranges, we map only non-ephemeral ports. Our logic for this is a bit rigid though: we assume the ephemeral ports are always a fixed range at the top of the port number space. We also assume PORT_EPHEMERAL_MIN is a multiple of 8, or we won't set the forward bitmap correctly. Make the logic in conf.c more flexible, using a helper moved into fwd.[ch], although we don't change which ports we consider ephemeral (yet). The new handling is undoubtedly more computationally expensive, but since it's a once-off operation at start off, I don't think it really matters. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	fwd, conf: Allow NAT of the guest's assigned address	David Gibson	2024-08-21	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The guest is usually assigned one of the host's IP addresses. That means it can't access the host itself via its usual address. The --map-host-loopback option (enabled by default with the gateway address) allows the guest to contact the host. However, connections forwarded this way appear on the host to have originated from the loopback interface, which isn't always desirable. Add a new --map-guest-addr option, which acts similarly but forwarded connections will go to the host's external address, instead of loopback. If '-a' is used, so the guest's address is not the same as the host's, this will instead forward to whatever host-visible site is shadowed by the guest's assigned address. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	fwd: Distinguish translatable from untranslatable addresses on inbound	David Gibson	2024-08-21	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \|	fwd_nat_from_host() needs to adjust the source address for new flows coming from an address which is not accessible to the guest. Currently we always use our_tap_addr or our_tap_ll. However in cases where the address is accessible to the guest via translation (i.e. via --map-host-loopback) then it makes more sense to use that translation, rather than the fallback mapping of our_tap_*. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	conf, fwd: Split notion of gateway/router from guest-visible host address	David Gibson	2024-08-21	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The @gw fields in the ip4_ctx and ip6_ctx give the (host's) default gateway. We use this for two quite distinct things: advertising the gateway that the guest should use (via DHCP, NDP and/or --config-net) and for a limited form of NAT. So that the guest can access services on the host, we map the gateway address within the guest to the loopback address on the host. Using the gateway address for this isn't necessarily the best choice for this purpose, certainly not for all circumstances. So, start off by splitting the notion of these into two different values: @guest_gw which is the gateway address the guest should use and @nat_host_loopback, which is the guest visible address to remap to the host's loopback. Usually nat_host_loopback will have the same value as guest_gw. However when --no-map-gw is specified we leave them unspecified instead. This means when we use nat_host_loopback, we don't need to separately check c->no_map_gw to see if it's relevant. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	fwd: Split notion of "our tap address" from gateway for IPv4	David Gibson	2024-08-21	1	-3/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ip4.gw conflates 3 conceptually different things, which (for now) have the same value: 1. The router/gateway address as seen by the guest 2. An address to NAT to the host with --no-map-gw isn't specified 3. An address to use as source when nothing else makes sense Case 3 occurs in two situations: a) for our DHCP responses - since they come from passt internally there's no naturally meaningful address for them to come from b) for forwarded connections coming from an address that isn't guest accessible (localhost or the guest's own address). (b) occurs even with --no-map-gw, and the expected behaviour of forwarding local connections requires it. For IPv6 role (3) is now taken by ip6.our_tap_ll (which usually has the same value as ip6.gw). For future flexibility we may want to make this "address of last resort" different from the gateway address, so split them logically for IPv4 as well. Specifically, add a new ip4.our_tap_addr field for the address with this role, and initialise it to ip4.gw for now. Unlike IPv6 where we can always get a link-local address, we might not be able to get a (non 0.0.0.0) address here (e.g. if the host is disconnected or only has a point to point link with no gateway address). In that case we have to disable forwarding of inbound connections with guest-inaccessible source addresses. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	fwd: Helpers to clarify what host addresses aren't guest accessible	David Gibson	2024-08-21	1	-11/+87
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We usually avoid NAT, but in a few cases we need to apply address translations. For inbound connections that happens for addresses which make sense to the host but are either inaccessible, or mean a different location from the guest's point of view. Add some helper functions to determine such addresses, and use them in fwd_nat_from_host(). In doing so clarify some of the reasons for the logic. We'll also have further use for these helpers in future. While we're there fix one unneccessary inconsistency between IPv4 and IPv6. We always translated the guest's observed address, but for IPv4 we didn't translate the guest's assigned address, whereas for IPv6 we did. Change this to translate both in all cases for consistency. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	Initialise our_tap_ll to ip6.gw when suitable	David Gibson	2024-08-21	1	-4/+1
\| \| \| \| \| \| \| \| \|	In every place we use our_tap_ll, we only use it as a fallback if the IPv6 gateway address is not link-local. We can avoid that conditional at use time by doing it at initialisation of our_tap_ll instead. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	treewide: Change misleading 'addr_ll' name	David Gibson	2024-08-21	1	-1/+1
\| \| \| \| \| \| \| \| \|	c->ip6.addr_ll is not like c->ip6.addr. The latter is an address for the guest, but the former is an address for our use on the tap link. Rename it accordingly, to 'our_tap_ll'. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	treewide: Use "our address" instead of "forwarding address"	David Gibson	2024-08-21	1	-35/+35
\| \| \| \| \| \| \| \| \| \| \| \|	The term "forwarding address" to indicate the local-to-passt address was well-intentioned, but ends up being kinda confusing. As discussed on a recent call, let's try "our" instead. (While we're there correct an error in flow_initiate_af()s comments where we referred to parameters by the wrong name). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	fwd: Broaden what we consider for DNS specific forwarding rules	David Gibson	2024-07-25	1	-2/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	passt/pasta has options to redirect DNS requests from the guest to a different server address on the host side. Currently, however, only UDP packets to port 53 are considered "DNS requests". This ignores DNS requests over TCP - less common, but certainly possible. It also ignores encrypted DNS requests on port 853. Extend the DNS forwarding logic to handle both of those cases. Link: https://github.com/containers/podman/issues/23239 Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Tested-by: Paul Holzinger <pholzing@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	fwd: Refactor tests in fwd_nat_from_tap() for clarity	David Gibson	2024-07-25	1	-13/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, we start by handling the common case, where we don't translate the destination address, then we modify the tgt side for the special cases. In the process we do comparisons on the tentatively set fields in tgt, which obscures the fact that tgt should be an essentially pure function of ini, and risks people examining fields of tgt that are not yet initialized. To make this clearer, do all our tests on 'ini', constructing tgt from scratch on that basis. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	udp: Remove rdelta port forwarding maps	David Gibson	2024-07-19	1	-12/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In addition to the struct fwd_ports used by both UDP and TCP to track port forwarding, UDP also included an 'rdelta' field, which contained the reverse mapping of the main port map. This was used so that we could properly direct reply packets to a forwarded packet where we change the destination port. This has now been taken over by the flow table: reply packets will match the flow of the originating packet, and that gives the correct ports on the originating side. So, eliminate the rdelta field, and with it struct udp_fwd_ports, which now has no additional information over struct fwd_ports. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	fwd: Update flow forwarding logic for UDP	David Gibson	2024-07-19	1	-4/+23
\| \| \| \| \| \| \| \| \| \| \|	Add logic to the fwd_nat_from_*() functions to forwarding UDP packets. The logic here doesn't exactly match our current forwarding, since our current forwarding has some very strange and buggy edge cases. Instead it's attempting to replicate what appears to be the intended logic behind the current forwarding. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	flow, tcp: Flow based NAT and port forwarding for TCP	David Gibson	2024-07-19	1	-0/+148
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently the code to translate host side addresses and ports to guest side addresses and ports, and vice versa, is scattered across the TCP code. This includes both port redirection as controlled by the -t and -T options, and our special case NAT controlled by the --no-map-gw option. Gather this logic into fwd_nat_from_*() functions for each input interface in fwd.c which take protocol and address information for the initiating side and generates the pif and address information for the forwarded side. This performs any NAT or port forwarding needed. We create a flow_target() helper which applies those forwarding functions as needed to automatically move a flow from INI to TGT state. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	treewide: Replace strerror() calls	Stefano Brivio	2024-06-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that we have logging functions embedding perror() functionality, we can make _some_ calls more terse by using them. In many places, the strerror() calls are still more convenient because, for example, they are used in flow debugging functions, or because the return code variable of interest is not 'errno'. While at it, convert a few error messages from a scant perror style to proper failure descriptions. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	treewide: Compilers' name for armv6l and armv7l is "arm"	Stefano Brivio	2024-04-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When I switched from 'uname -m' to 'gcc -dumpmachine' to fetch the architecture name for, among others, seccomp.sh, I didn't realise that "armv6l" and "armv7l" are just Linux kernel names -- compilers just call that "arm". Fix the "syscalls" annotation we use to define seccomp profiles accordingly, otherwise pasta will be terminated on sigreturn() on armv6l and armv7l. Fixes: 213c397492bd ("passt, pasta: Run-time selection of AVX2 build") Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	util: move IP stuff from util.[ch] to ip.[ch]	Laurent Vivier	2024-03-06	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Introduce ip.[ch] file to encapsulate IP protocol handling functions and structures. Modify various files to include the new header ip.h when it's needed. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-ID: <20240303135114.1023026-5-lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	fwd: Rename port_fwd.[ch] and their contents	David Gibson	2024-02-29	1	-0/+155
	Currently port_fwd.[ch] contains helpers related to port forwarding, particular automatic port forwarding. We're planning to allow much more flexible sorts of forwarding, including both port translation and NAT based on the flow table. This will subsume the existing port forwarding logic, so rename port_fwd.[ch] to fwd.[ch] with matching updates to all the names within. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>