passt - Plug A Simple Socket Transport

	Commit message (Collapse)	Author	Age	Files	Lines
*	netlink: Add functionality to copy routes from outer namespace	Stefano Brivio	2023-05-23	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of just fetching the default gateway and configuring a single equivalent route in the target namespace, on 'pasta --config-net', it might be desirable in some cases to copy the whole set of routes corresponding to a given output interface. For instance, in: https://github.com/containers/podman/issues/18539 IPv4 Default Route Does Not Propagate to Pasta Containers on Hetzner VPSes configuring the default gateway won't work without a gateway-less route (specifying the output interface only), because the default gateway is, somewhat dubiously, not on the same subnet as the container. This is a similar case to the one covered by commit 7656a6f88882 ("conf: Adjust netmask on mismatch between IPv4 address/netmask and gateway"), and I'm not exactly proud of that workaround. We also have: https://bugs.passt.top/show_bug.cgi?id=49 pasta does not work with tap-style interface for which, eventually, we should be able to configure a gateway-less route in the target namespace. Introduce different operation modes for nl_route(), including a new NL_DUP one, not exposed yet, which simply parrots back to the kernel the route dump for a given interface from the outer namespace, fixing up flags and interface indices on the way, and requesting to add the same routes in the target namespace, on the interface we manage. For n routes we want to duplicate, send n identical netlink requests including the full dump: routes might depend on each other and the kernel processes RTM_NEWROUTE messages sequentially, not atomically, and repeating the full dump naturally resolves dependencies without the need to actually calculate them. I'm not kidding, it actually works pretty well. Link: https://github.com/containers/podman/issues/18539 Link: https://bugs.passt.top/show_bug.cgi?id=49 Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	util, conf: Add and use ns_is_init() helper	Stefano Brivio	2023-05-23	1	-15/+1
\| \| \| \| \| \| \| \| \| \|	We'll need this in isolate_initial(). While at it, don't rely on BUFSIZ: the earlier issue we had with musl reminded me it's not a magic "everything will fit" value. Size the read buffer to what we actually need from uid_map, and check for the final newline too, because uid_map is organised in lines. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	passt: Relicense to GPL 2.0, or any later version	Stefano Brivio	2023-04-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In practical terms, passt doesn't benefit from the additional protection offered by the AGPL over the GPL, because it's not suitable to be executed over a computer network. Further, restricting the distribution under the version 3 of the GPL wouldn't provide any practical advantage either, as long as the passt codebase is concerned, and might cause unnecessary compatibility dilemmas. Change licensing terms to the GNU General Public License Version 2, or any later version, with written permission from all current and past contributors, namely: myself, David Gibson, Laine Stump, Andrea Bolognani, Paul Holzinger, Richard W.M. Jones, Chris Kuhn, Florian Weimer, Giuseppe Scrivano, Stefan Hajnoczi, and Vasiliy Ulyanov. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	conf: Allow binding to ports on an interface without a specific address	Stefano Brivio	2023-03-29	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Somebody might want to bind listening sockets to a specific interface, but not a specific address, and there isn't really a reason to prevent that. For example: -t %eth0/2022 Alternatively, we support options such as -t 0.0.0.0%eth0/2022 and -t ::%eth0/2022, but not together, for the same port. Enable this kind of syntax and add examples to the man page. Reported-by: Paul Holzinger <pholzing@redhat.com> Link: https://github.com/containers/podman/issues/14425#issuecomment-1485192195 Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	Work around weird false positives with cppcheck-2.9.1	David Gibson	2023-03-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 89e38f55 "treewide: Fix header includes to build with musl" added extra #includes to work with musl. Unfortunately with the cppcheck version I'm using (cppcheck-2.9-1.fc37.x86_64 in Fedora 37) this causes weird false positives: specifically cppcheck seems to hit a #error in <bits/unistd.h> complaining about including it directly instead of via <unistd.h> (which is not something we're doing). I have no idea why that would be happening; but I'm guessing it has to be a bug in the cpp implementation in that cppcheck version. In any case, it's possible to work around this by moving the include of <unistd.h> before the include of <signal.h>. So, do that. Fixes: 89e38f55405d ("treewide: Fix header includes to build with musl") Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	conf: Terminate on EMFILE or ENFILE on sockets for port mapping	Stefano Brivio	2023-03-09	1	-7/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In general, we don't terminate or report failures if we fail to bind to some ports out of a given port range specifier, to allow users to conveniently specify big port ranges (or "all") without having to care about ports that might already be in use. However, running out of the open file descriptors quota is a different story: we can't do what the user requested in a very substantial way. For example, if the user specifies '-t all' and we can only bind 1024 sockets, the behaviour is rather unexpected. Fail whenever socket creation returns -ENFILE or -EMFILE. Link: https://bugs.passt.top/show_bug.cgi?id=27 Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	treewide: Fix header includes to build with musl	Chris Kuhn	2023-03-09	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Roughly inspired from a patch by Chris Kuhn: fix up includes so that we can build against musl: glibc is more lenient as headers generally include a larger amount of other headers. Compared to the original patch, I only included what was needed directly in C files, instead of adding blanket includes in local header files. It's a bit more involved, but more consistent with the current (not ideal) situation. Reported-by: Chris Kuhn <kuhnchris+github@kuhnchris.eu> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	conf, passt: Rename stderr to force_stderr	Chris Kuhn	2023-03-09	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \|	While building against musl, gcc informs us that 'stderr' is a protected keyword. This probably comes from a #define stderr (stderr) in musl's stdio.h, to avoid a clash with extern FILE *const stderr, but I didn't really track it down. Just rename it to force_stderr, it makes more sense. [sbrivio: Added commit message] Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	conf, icmp, tcp, udp: Add options to bind to outbound address and interface	Stefano Brivio	2023-03-09	1	-12/+89
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I didn't notice earlier: libslirp (and slirp4netns) supports binding outbound sockets to specific IPv4 and IPv6 addresses, to force the source addresse selection. If we want to claim feature parity, we should implement that as well. Further, Podman supports specifying outbound interfaces as well, but this is simply done by resolving the primary address for an interface when the network back-end is started. However, since kernel version 5.7, commit c427bfec18f2 ("net: core: enable SO_BINDTODEVICE for non-root users"), we can actually bind to a specific interface name, which doesn't need to be validated in advance. Implement -o / --outbound ADDR to bind to IPv4 and IPv6 addresses, and --outbound-if4 and --outbound-if6 to bind IPv4 and IPv6 sockets to given interfaces. Given that it probably makes little sense to select addresses and routes from interfaces different than the ones given for outbound sockets, also assign those as "template" interfaces, by default, unless explicitly overridden by '-i'. For ICMP and UDP, we call sock_l4() to open outbound sockets, as we already needed to bind to given ports or echo identifiers, and we can bind() a socket only once: there, pass address (if any) and interface (if any) for the existing bind() and setsockopt() calls. For TCP, in general, we wouldn't otherwise bind sockets. Add a specific helper to do that. For UDP outbound sockets, we need to know if the final destination of the socket is a loopback address, before we decide whether it makes sense to bind the socket at all: move the block mangling the address destination before the creation of the socket in the IPv4 path. This was already the case for the IPv6 path. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	conf, passt.h: Rename "outbound" interface to "template" interface	Stefano Brivio	2023-03-09	1	-4/+7
\| \| \| \| \| \| \| \| \| \| \| \|	In preparation for the next patch, make it clear that the first routable interface fetched via netlink, or the one configured via -i/--interface, is simply used as template to copy addresses and routes, not an interface we actually use to derive the source address (which will be _bound to_) for outgoing packets. The man page and usage message appear to be already clear enough. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	log, conf, tap: Define die() as err() plus exit(), drop cppcheck workarounds	Stefano Brivio	2023-02-27	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \|	If we define die() as a variadic macro, passing __VA_ARGS__ to err(), and calling exit() outside err() itself, we can drop the workarounds introduced in commit 36f0199f6ef4 ("conf, tap: Silence two false positive invalidFunctionArg from cppcheck"). Suggested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	conf, udp: Allow any loopback address to be used as resolver	Stefano Brivio	2023-02-27	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Andrea reports that with a Fedora 37 guest running on a Fedora 37 host, both using systemd-resolved, with passt connecting them, running with default options, DNS queries don't work. systemd-resolved on the host is reachable only at the loopback address 127.0.0.53. We advertise the default gateway address to the guest as resolver, because our local address is of course unreachable from there, which means we see DNS queries directed to the default gateway, and we redirect them to 127.0.0.1. However, systemd-resolved doesn't answer on 127.0.0.1. To fix this, set @dns_match to the address of the default gateway, unless a different resolver address is explicitly configured, so that we know we explicitly have to map DNS queries, in this case, to the address of the local resolver. This means that in udp_tap_handler() we need to check, first, if the destination address of packets matches @dns_match: even if it's the address of the local gateway, we want to map that to a specific address, which isn't necessarily 127.0.0.1. Do the same for IPv6 for consistency, even though IPv6 defines a single loopback address. Reported-by: Andrea Bolognani <abologna@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Tested-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	conf: Split add_dns{4,6}() out of get_dns()	Stefano Brivio	2023-02-27	1	-35/+51
\| \| \| \| \| \| \| \| \| \| \| \|	The logic handling which resolvers we add, and whether to add them, is getting rather cramped in get_dns(): split it into separate functions. No functional changes intended. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Tested-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	conf: Fix typo and logic in conf_ports() check for port binding2023_02_16.4663ccc	Stefano Brivio	2023-02-16	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ouch, I accidentally pushed the previous change without running the tests: - we need to check, in conf_ports(), that udp_sock_init() managed to bind at least a port, not the opposite - for -T and -U, we have no way to know if we'll manage to bind the port later, so never report an error for those Fixes: 3d0de2c1d727 ("conf, tcp, udp: Exit if we fail to bind sockets for all given ports") Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	conf, tap: Silence two false positive invalidFunctionArg from cppcheck	Stefano Brivio	2023-02-16	1	-0/+3
\| \| \| \| \| \| \| \|	The newly introduced die() calls exit(), but cppcheck doesn't see it and warns about possibly invalid arguments used after the check which triggers die(). Add return statements to silence the warnings. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	conf, tcp, udp: Exit if we fail to bind sockets for all given ports	Stefano Brivio	2023-02-16	1	-13/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	passt supports ranges of forwarded ports as well as 'all' for TCP and UDP, so it might be convenient to proceed if we fail to bind only some of the desired ports. But if we fail to bind even a single port for a given specification, we're clearly, unexpectedly, conflicting with another network service. In that case, report failure and exit. Reported-by: Yalan Zhang <yalzhang@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	log a detailed error (not usage()) when there are extra non-option arguments	Laine Stump	2023-02-16	1	-1/+1
\| \| \| \| \|	Signed-off-by: Laine Stump <laine@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	make conf_netns_opt() exit immediately after logging error	Laine Stump	2023-02-16	1	-12/+4
\| \| \| \| \| \| \|	...and return void to simplify the caller. Signed-off-by: Laine Stump <laine@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	make conf_ugid() exit immediately after logging error	Laine Stump	2023-02-16	1	-18/+9
\| \| \| \| \| \| \|	Again, it can then be made to return void, simplifying the caller. Signed-off-by: Laine Stump <laine@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	make conf_pasta_ns() exit immediately after logging error	Laine Stump	2023-02-16	1	-24/+11
\| \| \| \| \| \| \|	As with conf_ports, this allows us to make the function return void. Signed-off-by: Laine Stump <laine@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	make conf_ports() exit immediately after logging error	Laine Stump	2023-02-16	1	-24/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rather than having conf_ports() (possibly) log an error, and then letting the caller log the entire usage() message and exit, just log the errors and exit immediately (using die()). For some errors, conf_ports would previously not log any specific message, leaving it up to the user to determine the problem by guessing. We replace all of those silent returns with die() (logging a specific error), thus permitting us to make conf_ports() return void, which simplifies the caller. While modifying the two callers to conf_ports() to not check for a return value, we can further simplify the code by removing the check for a non-null optarg, as that is guaranteed to never happen (due to prior calls to getopt_long() with "argument required" for all relevant options - getopt_long() would have already caught this error). Signed-off-by: Laine Stump <laine@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	eliminate most calls to usage() in conf()	Laine Stump	2023-02-16	1	-214/+122
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Nearly all of the calls to usage() in conf() occur immediately after logging a more detailed error message, and the fact that these errors are occuring indicates that the user has already seen the passt usage message (or read the manpage). Spamming the logfile with the complete contents of the usage message serves only to obscure the more detailed error message. The only time when the full usage message should be output is if the user explicitly asks for it with -h (or its synonyms) As a start to eliminating the excessive calls to usage(), this patch replaces most calls to err() followed by usage() with a call to die() instead. A few other usage() calls remain, but their removal involves bit more nuance that should be properly explained in separate commit messages. Signed-off-by: Laine Stump <laine@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	passt, tap: Add --fd option	Richard W.M. Jones	2022-11-25	1	-2/+26
\| \| \| \| \| \| \| \| \|	This passes a fully connected stream socket to passt. Signed-off-by: Richard W.M. Jones <rjones@redhat.com> [sbrivio: reuse fd_tap instead of adding a new descriptor, imply --one-off on --fd, add to optstring and usage()] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	tcp: Separate helpers to create ns listening sockets	David Gibson	2022-11-25	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	tcp_sock_init*() can create either sockets listening on the host, or in the pasta network namespace (with @ns==1). There are, however, a number of differences in how these two cases work in practice though. "ns" sockets are only used in pasta mode, and they always lead to spliced connections only. The functions are also only ever called in "ns" mode with a NULL address and interface name, and it doesn't really make sense for them to be called any other way. Later changes will introduce further differences in behaviour between these two cases, so it makes more sense to use separate functions for creating the ns listening sockets than the regular external/host listening sockets. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	style: Minor corrections to function comments	David Gibson	2022-11-25	1	-3/+3
\| \| \| \| \| \| \|	Some style issues and a typo. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	conf, udp: Drop mostly duplicated dns_send arrays, rename related fields	Stefano Brivio	2022-11-16	1	-43/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Given that we use just the first valid DNS resolver address configured, or read from resolv.conf(5) on the host, to forward DNS queries to, in case --dns-forward is used, we don't need to duplicate dns[] to dns_send[]: - rename dns_send[] back to dns[]: those are the resolvers we advertise to the guest/container - for forwarding purposes, instead of dns[], use a single field (for each protocol version): dns_host - and rename dns_fwd to dns_match, so that it's clear this is the address we are matching DNS queries against, to decide if they need to be forwarded Suggested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	conf: Fix mask calculation from prefix_len in conf_print()2022_11_10.4129764	Stefano Brivio	2022-11-10	1	-1/+4
\| \| \| \| \| \| \|	Reported-by: Paul Holzinger <pholzing@redhat.com> Fixes: dd09cceaee21 ("Minor improvements to IPv4 netmask handling") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	conf: Split the notions of read DNS addresses and offered ones	Stefano Brivio	2022-11-04	1	-14/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With --dns-forward, if the host has a loopback address configured as DNS server, we should actually use it to forward queries, but, if --no-map-gw is passed, we shouldn't offer the same address via DHCP, NDP and DHCPv6, because it's not going to be reachable. Problematic configuration: * systemd-resolved configuring the usual 127.0.0.53 on the host: we read that from /etc/resolv.conf * --dns-forward specified with an unrelated address, for example 198.51.100.1 We still want to forward queries to 127.0.0.53, if we receive one directed to 198.51.100.1, so we can't drop 127.0.0.53 from our list: we want to use it for forwarding. At the same time, we shouldn't offer 127.0.0.53 to the guest or container either. With this change, I'm only covering the case of automatically configured DNS servers from /etc/resolv.conf. We could extend this to addresses configured with command-line options, but I don't really see a likely use case at this point. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	conf: Adjust netmask on mismatch between IPv4 address/netmask and gateway	Stefano Brivio	2022-11-04	1	-1/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Seen in a Google Compute Engine environment with a machine configured via cloud-init-dhcp, while testing Podman integration for pasta: the assigned address has a /32 netmask, and there's a default route, which can be added on the host because there's another route, also /32, pointing to the default gateway. For example, on the host: ip -4 address add 10.156.0.2/32 dev eth0 ip -4 route add 10.156.0.1/32 dev eth0 ip -4 route add default via 10.156.0.1 This is not a valid configuration as far as I can tell: if the address is configured as /32, it shouldn't be used to reach a gateway outside its derived netmask. However, Linux allows that, and everything works. The problem comes when pasta --config-net sources address and default route from the host, and it can't configure the route in the target namespace because the gateway is invalid. That is, we would skip configuring the first route in the example, which results in the equivalent of doing: ip -4 address add 10.156.0.2/32 dev eth0 ip -4 route add default via 10.156.0.1 where, at this point, 10.156.0.1 is unreachable, and hence invalid as a gateway. Sourcing more routes than just the default is doable, but probably undesirable: pasta users want to provide connectivity to a container, not reflect exactly whatever trickery is configured on the host. Add a consistency check and an adjustment: if the configured default gateway is not reachable, shrink the given netmask until we can reach it. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	Use typing to reduce chances of IPv4 endianness errors	David Gibson	2022-11-04	1	-26/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We recently corrected some errors handling the endianness of IPv4 addresses. These are very easy errors to make since although we mostly store them in network endianness, we sometimes need to manipulate them in host endianness. To reduce the chances of making such mistakes again, change to always using a (struct in_addr) instead of a bare in_addr_t or uint32_t to store network endian addresses. This makes it harder to accidentally do arithmetic or comparisons on such addresses as if they were host endian. We introduce a number of IN4_IS_ADDR_*() helpers to make it easier to directly work with struct in_addr values. This has the additional benefit of making the IPv4 and IPv6 paths more visually similar. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	Use IPV4_IS_LOOPBACK more widely	David Gibson	2022-11-04	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	This macro checks if an IPv4 address is in the loopback network (127.0.0.0/8). There are two places where we open code an identical check, use the macro instead. There are also a number of places we specifically exclude the loopback address (127.0.0.1), but we should actually be excluding anything in the loopback network. Change those sites to use the macro as well. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	Minor improvements to IPv4 netmask handling	David Gibson	2022-11-04	1	-24/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are several minor problems with our parsing of IPv4 netmasks (-n). First, we don't reject nonsensical netmasks like 0.255.0.255. Address this structurally by using prefix length instead of netmask as the primary variable, only converting (and validating) when we need to. This has the added benefit of making some things more uniform with the IPv6 path. Second, when the user specifies a prefix length, we truncate the output from strtol() to an integer, which means we would treat -n 4294967320 as valid (equivalent to 24). Fix types to check for this. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	Correct some missing endian conversions of IPv4 addresses	David Gibson	2022-11-04	1	-13/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The INADDR_LOOPBACK constant is in host endianness, and similarly the IN_MULTICAST macro expects a host endian address. However, there are some places in passt where we use those with network endian values. This means that passt will incorrectly allow you to set 127.0.0.1 or a multicast address as the guest address or DNS forwarding address. Add the necessary conversions to correct this. INADDR_ANY and INADDR_BROADCAST logically behave the same way, although because they're palindromes it doesn't have an effect in practice. Change them to be logically correct while we're there, though. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	conf, passt.1: Don't imply --foreground with --debug	Stefano Brivio	2022-10-27	1	-4/+3
\| \| \| \| \| \| \| \| \| \|	Having -f implied by -d (and --trace) usually saves some typing, but debug mode in background (with a log file) is quite useful if pasta is started by Podman, and is probably going to be handy for passt with libvirt later, too. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	conf: Don't pass leading ~ to parse_port_range() on exclusions2022_10_24.c11277b	Stefano Brivio	2022-10-24	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 84fec4e998b6 ("Clean up parsing of port ranges") drops the strspn() call before the parsing of excluded port ranges, because now we're checking against any stray characters at every step. However, that also has the effect of passing ~ as first character to the new parse_port_range(), which makes no sense: we already checked that ~ is the first character before the call, so skip it. Alona reported this output: Invalid port specifier ~15000,~15001,~15006,~15008,~15020,~15021,~15090 while the whole specifier is indeed valid. Reported-by: Alona Paz <alkaplan@redhat.com> Fixes: 84fec4e998b6 ("Clean up parsing of port ranges") Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	conf: Bind inbound ports with CAP_NET_BIND_SERVICE before isolate_user()	Stefano Brivio	2022-10-15	1	-36/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Even if CAP_NET_BIND_SERVICE is granted, we'll lose the capability in the target user namespace as we isolate the process, which means we're unable to bind to low ports at that point. Bind inbound ports, and only those, before isolate_user(). Keep the handling of outbound ports (for pasta mode only) after the setup of the namespace, because that's where we'll bind them. To this end, initialise the netlink socket for the init namespace before isolate_user() as well, as we actually need to know the addresses of the upstream interface before binding ports, in case they're not explicitly passed by the user. As we now call nl_sock_init() twice, checking its return code from conf() twice looks a bit heavy: make it exit(), instead, as we can't do much if we don't have netlink sockets. While at it: - move the v4_only && v6_only options check just after the first option processing loop, as this is more strictly related to option parsing proper - update the man page, explaining that CAP_NET_BIND_SERVICE is not the preferred way to bind ports, because passt and pasta can be abused to allow other processes to make effective usage of it. Add a note about the recommended sysctl instead - simplify nl_sock_init_do() now that it's called once for each case Reported-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	isolation: Only configure UID/GID mappings in userns when spawning shell	David Gibson	2022-10-15	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When in passt mode, or pasta mode spawning a command, we create a userns for ourselves. This is used both to isolate the pasta/passt process itself and to run the spawned command, if any. Since eed17a47 "Handle userns isolation and dropping root at the same time" we've handled both cases the same, configuring the UID and GID mappings in the new userns to map whichever UID we're running as to root within the userns. This mapping is desirable when spawning a shell or other command, so that the user gets a root shell with reasonably clear abilities within the userns and netns. It's not necessarily essential, though. When not spawning a shell, it doesn't really have any purpose: passt itself doesn't need to be root and can operate fine with an unmapped user (using some of the capabilities we get when entering the userns instead). Configuring the uid_map can cause problems if passt is running with any capabilities in the initial namespace, such as CAP_NET_BIND_SERVICE to allow it to forward low ports. In this case the kernel makes files in /proc/pid owned by root rather than the starting user to prevent the user from interfering with the operation of the capability-enhanced process. This includes uid_map meaning we are not able to write to it. Whether this behaviour is correct in the kernel is debatable, but in any case we might as well avoid problems by only initializing the user mappings when we really want them. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	isolation: Replace drop_caps() with a version that actually does something	David Gibson	2022-10-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current implementation of drop_caps() doesn't really work because it attempts to drop capabilities from the bounding set. That's not the set that really matters, it's about limiting the abilities of things we might later exec() rather than our own capabilities. It also requires CAP_SETPCAP which we won't usually have. Replace it with a new version which uses setcap(2) to drop capabilities from the effective and permitted sets. For now we leave the inheritable set as is, since we don't want to preclude the user from passing inheritable capabilities to the command spawed by pasta. Correctly dropping caps reveals that we were relying on some capabilities we'd supposedly dropped. Re-divide the dropping of capabilities between isolate_initial(), isolate_user() and isolate_prefork() to make this work. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	conf: Drop excess colons in usage for DHCP and DNS options	Stefano Brivio	2022-10-15	1	-4/+4
\| \| \| \| \|	Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	conf: Report usage for --no-netns-quit	Stefano Brivio	2022-10-15	1	-0/+2
\| \| \| \| \| \|	Fixes: 745a9ba4284c ("pasta: By default, quit if filesystem-bound net namespace goes away") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	conf, tcp, udp: Allow specification of interface to bind to	Stefano Brivio	2022-10-15	1	-10/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since kernel version 5.7, commit c427bfec18f2 ("net: core: enable SO_BINDTODEVICE for non-root users"), we can bind sockets to interfaces, if they haven't been bound yet (as in bind()). Introduce an optional interface specification for forwarded ports, prefixed by %, that can be passed together with an address. Reported use case: running local services that use ports we want to have externally forwarded: https://github.com/containers/podman/issues/14425 Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	conf, tap: Add option to quit once the client closes the connection	Stefano Brivio	2022-10-15	1	-1/+16
\| \| \| \| \| \| \| \|	This is practical to avoid explicit lifecycle management in users, e.g. libvirtd, and is trivial to implement. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	conf, log, Makefile: Add versioning information	Stefano Brivio	2022-10-15	1	-0/+8
\| \| \| \| \| \| \|	Add a --version option displaying that, and also include this information in the log files. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	log, conf: Add support for logging to file	Stefano Brivio	2022-10-14	1	-3/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In some environments, such as KubeVirt pods, we might not have a system logger available. We could choose to run in foreground, but this takes away the convenient synchronisation mechanism derived from forking to background when interfaces are ready. Add optional logging to file with -l/--log-file and --log-size. Unfortunately, this means we need to duplicate features that are more appropriately implemented by a system logger, such as rotation. Keep that reasonably simple, by using fallocate() with range collapsing where supported (Linux kernel >= 3.15, extent-based ext4 and XFS) and falling back to an unsophisticated block-by-block moving of entries toward the beginning of the file once we reach the (mandatory) size limit. While at it, clarify the role of LOG_EMERG in passt.c. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	conf: Drop duplicate, diverging optstring assignments	Stefano Brivio	2022-10-14	1	-14/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	This originated as a result of copy and paste to introduce a second stage for processing options related to port forwarding, has already bitten David in the past, and just gave me hours of fun. As a matter of fact, the second set of optstring assignments was already incorrect, but it didn't matter because the first one was more restrictive, not allowing optional arguments for -P, -D, -S. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	Move logging functions to a new file, log.c	Stefano Brivio	2022-10-14	1	-0/+1
\| \| \| \| \| \| \| \|	Logging to file is going to add some further complexity that we don't want to squeeze into util.c. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	cppcheck: Use inline suppression for strtok() in conf.c	David Gibson	2022-09-29	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \|	strtok() is non-reentrant and old-fashioned, so cppcheck would complains about its use in conf.c if it weren't suppressed. We're single threaded and strtok() is convenient though, so it's not really worth reworking at this time. Convert this to an inline suppression so it's adjacent to the code its annotating. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	Don't shadow 'i' in conf_ports()	David Gibson	2022-09-29	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The counter 'i' is used in a number of places in conf_ports(), but in one of those we unnecessarily shadow it in an inner scope. We could re-use the same 'i' every time, but each use is logically separate, so instead remove the outer declaration and declare it locally in each of the clauses where we need it. While we're there change it from a signed to unsigned int, since it's used to iterate over port numbers which are generally treated as unsigned. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	Clean up parsing in conf_runas()	David Gibson	2022-09-29	1	-45/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	conf_runas() handles several of the different possible cases for the --runas argument in a slightly odd order. Although it can parse both numeric UIDs/GIDs and user/group names, it can't parse a numeric UID combined with a group name or vice versa. That's not obviously useful, but it's slightly surprising gap to have. Rework the parsing to be more systematic: first split the option into user and (optional) group parts, then separately parse each part as either numeric or a name. As a bonus this removes some clang-tidy warnings. While we're there also add cppcheck suppressions for getpwnam() and getgrnam(). It complains about those because they're not reentrant. passt is single threaded though, and is always likely to be during this initialization code, even if we multithread later. There were some existing suppressions for these in the cppcheck invocation but they're no longer up to date. Replace them with inline suppressions which, being next to the code, are more likely to stay correct. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	Clean up parsing of port ranges	David Gibson	2022-09-29	1	-140/+102
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	conf_ports() parses ranges of ports for the -t, -u, -T and -U options. The code is quite difficult to the follow, to the point that clang-tidy and cppcheck disagree on whether one of the pointers can be NULL at some points. Rework the code with the use of two new helper functions: * parse_port_range() operates a bit like strtoul(), but can parse a whole port range specification (e.g. '80' or '1000-1015') * next_chunk() does the necessary wrapping around strchr() to advance to just after the next given delimiter, while cleanly handling if there are no more delimiters The new version is easier to follow, and also removes some cppcheck warnings. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>