passt - Plug A Simple Socket Transport

	Commit message (Collapse)	Author	Age	Files	Lines
*	README: Update Features section, plus minor improvements2023_03_09.7c7625d	Stefano Brivio	2023-03-09	1	-47/+28
\| \| \| \| \| \| \|	...it's been a while. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	contrib: Drop libvirt out-of-tree patch, integration mostly works in 9.1.0	Stefano Brivio	2023-03-09	1	-427/+0
\| \| \| \| \| \| \| \|	...and in any case, this patch doesn't offer any advantage over the current upstream integration. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	contrib: Drop QEMU out-of-tree patches	Stefano Brivio	2023-03-09	2	-208/+0
\| \| \| \| \| \| \|	Native support was introduced with commit 13c6be96618c, QEMU 7.2. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	contrib: Drop Podman out-of-tree patch, integration is upstream now	Stefano Brivio	2023-03-09	1	-605/+0
\| \| \| \| \| \| \| \|	See https://github.com/containers/podman/pull/16141, shipped in Podman 4.4. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	tcp: Clamp MSS value when queueing data to tap, also for pasta	Stefano Brivio	2023-03-09	1	-14/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Tom reports that a pattern of repated ~1 MiB chunks downloads over NNTP over TLS, on Podman 4.4 using pasta as network back-end, results in pasta taking one full CPU thread after a while, and the download never succeeds. On that setup, we end up re-sending the same frame over and over, with a consistent 65 534 bytes size, and never get an acknowledgement from the tap-side client. This only happens for the default MTU value (65 520 bytes) or for values that are slightly smaller than that (down to 64 499 bytes). We hit this condition because the MSS value we use in tcp_data_from_sock(), only in pasta mode, is simply clamped to USHRT_MAX, and not to the actual size of the buffers we pre-cooked for sending, which is a bit less than that. It looks like we got away with it until commit 0fb7b2b9080a ("tap: Use different io vector bases depending on tap type") fixed the setting of iov_len. Luckily, since it's pasta, we're queueing up to two frames at a time, so the worst that can happen is a badly segmented TCP stream: we always have some space at the tail of the buffer. Clamp the MSS value to the appropriate maximum given by struct tcp{4,6}_buf_data_t, no matter if we're running in pasta or passt mode. While at it, fix the comments to those structs to reflect the current struct size. This is not really relevant for any further calculation or consideration, but it's convenient to know while debugging this kind of issues. Thanks to Tom for reporting the issue in a very detailed way and for providing a test setup. Reported-by: Tom Mombourquette <tom@devnode.com> Link: https://github.com/containers/podman/issues/17703 Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	conf: Terminate on EMFILE or ENFILE on sockets for port mapping	Stefano Brivio	2023-03-09	1	-7/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In general, we don't terminate or report failures if we fail to bind to some ports out of a given port range specifier, to allow users to conveniently specify big port ranges (or "all") without having to care about ports that might already be in use. However, running out of the open file descriptors quota is a different story: we can't do what the user requested in a very substantial way. For example, if the user specifies '-t all' and we can only bind 1024 sockets, the behaviour is rather unexpected. Fail whenever socket creation returns -ENFILE or -EMFILE. Link: https://bugs.passt.top/show_bug.cgi?id=27 Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	tcp, udp: Fix partial success return codes in {tcp,udp}_sock_init()	Stefano Brivio	2023-03-09	2	-28/+23
\| \| \| \| \| \| \| \| \| \| \| \|	The comments say we should return 0 on partial success, and an error code on complete failure. Rationale: if the user configures a port forwarding, and we succeed to bind that port for IPv4 or IPv6 only, that might actually be what the user intended. Adjust the two functions to reflect the comments. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	tcp, udp, util: Pass socket creation errors all the way up	Stefano Brivio	2023-03-09	3	-32/+39
\| \| \| \| \| \| \| \| \|	...starting from sock_l4(), pass negative error (errno) codes instead of -1. They will only be used in two commits from now, no functional changes intended here. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	util: Carry own definition of __bswap_constant{16,32}	Stefano Brivio	2023-03-09	1	-0/+11
\| \| \| \| \| \| \| \| \| \|	musl doesn't define those, use our own definition there. This is a trivial implementation, similar to what's shipped by e.g. uClibc, glibc, libiio. Reported-by: Chris Kuhn <kuhnchris+github@kuhnchris.eu> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	treewide: Fix header includes to build with musl	Chris Kuhn	2023-03-09	9	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Roughly inspired from a patch by Chris Kuhn: fix up includes so that we can build against musl: glibc is more lenient as headers generally include a larger amount of other headers. Compared to the original patch, I only included what was needed directly in C files, instead of adding blanket includes in local header files. It's a bit more involved, but more consistent with the current (not ideal) situation. Reported-by: Chris Kuhn <kuhnchris+github@kuhnchris.eu> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	conf, passt: Rename stderr to force_stderr	Chris Kuhn	2023-03-09	3	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \|	While building against musl, gcc informs us that 'stderr' is a protected keyword. This probably comes from a #define stderr (stderr) in musl's stdio.h, to avoid a clash with extern FILE *const stderr, but I didn't really track it down. Just rename it to force_stderr, it makes more sense. [sbrivio: Added commit message] Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	netlink: Use 8 KiB * netlink message header size as response buffer	Stefano Brivio	2023-03-09	1	-6/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	...instead of BUFSIZ. On musl, BUFSIZ is 1024, so we'll typically truncate the response to the request we send in nl_link(). It's usually 8192 or more with glibc. There doesn't seem to be any macro defining the rtnetlink maximum message size, and iproute2 just hardcodes 1024 * 1024 for the receive buffer, but the example in netlink(7) makes somewhat sense, looking at the kernel implementation. It's not very clean, but we're very unlikely to hit that limit, and if we do, we'll find out painlessly, because NLA_OK() will tell us right away. Reported-by: Chris Kuhn <kuhnchris+passt@kuhnchris.eu> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	conf, icmp, tcp, udp: Add options to bind to outbound address and interface	Stefano Brivio	2023-03-09	6	-35/+243
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I didn't notice earlier: libslirp (and slirp4netns) supports binding outbound sockets to specific IPv4 and IPv6 addresses, to force the source addresse selection. If we want to claim feature parity, we should implement that as well. Further, Podman supports specifying outbound interfaces as well, but this is simply done by resolving the primary address for an interface when the network back-end is started. However, since kernel version 5.7, commit c427bfec18f2 ("net: core: enable SO_BINDTODEVICE for non-root users"), we can actually bind to a specific interface name, which doesn't need to be validated in advance. Implement -o / --outbound ADDR to bind to IPv4 and IPv6 addresses, and --outbound-if4 and --outbound-if6 to bind IPv4 and IPv6 sockets to given interfaces. Given that it probably makes little sense to select addresses and routes from interfaces different than the ones given for outbound sockets, also assign those as "template" interfaces, by default, unless explicitly overridden by '-i'. For ICMP and UDP, we call sock_l4() to open outbound sockets, as we already needed to bind to given ports or echo identifiers, and we can bind() a socket only once: there, pass address (if any) and interface (if any) for the existing bind() and setsockopt() calls. For TCP, in general, we wouldn't otherwise bind sockets. Add a specific helper to do that. For UDP outbound sockets, we need to know if the final destination of the socket is a loopback address, before we decide whether it makes sense to bind the socket at all: move the block mangling the address destination before the creation of the socket in the IPv4 path. This was already the case for the IPv6 path. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	conf, passt.h: Rename "outbound" interface to "template" interface	Stefano Brivio	2023-03-09	2	-6/+9
\| \| \| \| \| \| \| \| \| \| \| \|	In preparation for the next patch, make it clear that the first routable interface fetched via netlink, or the one configured via -i/--interface, is simply used as template to copy addresses and routes, not an interface we actually use to derive the source address (which will be _bound to_) for outgoing packets. The man page and usage message appear to be already clear enough. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	contrib/selinux: Let interface users set paths for log, PID, socket files	Stefano Brivio	2023-03-09	1	-1/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Even libvirt itself will configure passt to write log, PID and socket files to different locations depending on whether the domain is started as root (/var/log/libvirt/...) or as a regular user (/var/log/<PID>/libvirt/...), and user_tmp_t would only cover the latter. Create interfaces for log and PID files, so that callers can specify different file contexts for those, and modify the interface for the UNIX socket file to allow different paths as well. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Tested-by: Laine Stump <laine@redhat.com> Reviewed-by: Laine Stump <laine@redhat.com>
*	contrib/selinux: Allow binding and connecting to all UDP and TCP ports	Stefano Brivio	2023-03-09	1	-12/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Laine reports that with a simple: <portForward proto='tcp'> <range start='2022' to='22'/> </portForward> in libvirt's domain XML, passt won't start as it fails to bind arbitrary ports. That was actually the intention behind passt_port_t: the user or system administrator should have explicitly configured allowed ports on a given machine. But it's probably not realistic, so just allow any port to be bound and forwarded. Also fix up some missing operations on sockets. Reported-by: Laine Stump <laine@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Tested-by: Laine Stump <laine@redhat.com> Reviewed-by: Laine Stump <laine@redhat.com>
*	contrib/selinux: Let passt write to stdout and stderr when it starts	Stefano Brivio	2023-03-09	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	Otherwise, it's unusable as stand-alone tool, or in foreground mode, and it's also impossible to get output from --help or --version, because for SELinux it's just a daemon. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Tested-by: Laine Stump <laine@redhat.com> Reviewed-by: Laine Stump <laine@redhat.com>
*	contrib/selinux: Drop duplicate init_daemon_domain() rule	Stefano Brivio	2023-03-09	1	-1/+0
\| \| \| \| \| \|	Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Tested-by: Laine Stump <laine@redhat.com> Reviewed-by: Laine Stump <laine@redhat.com>
*	udp: Fix signedness warning on 32-bits architectures	Stefano Brivio	2023-03-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a ssize_t is an int: udp.c: In function ‘udp_sock_handler’: udp.c:774:23: warning: comparison of integer expressions of different signedness: ‘unsigned int’ and ‘ssize_t’ {aka ‘int’} [-Wsign-compare] 774 \| for (i = 0; i < n; i += m) { \| ^ udp.c:781:43: warning: comparison of integer expressions of different signedness: ‘unsigned int’ and ‘ssize_t’ {aka ‘int’} [-Wsign-compare] 781 \| for (m = 1; i + m < n; m++) { \| Change 'i' and 'm' counters in udp_sock_handler() to signed versions, to match ssize_t n. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	Makefile: Fix SuperH 4 builds: it's AUDIT_ARCH_SH, not AUDIT_ARCH_SH4	Stefano Brivio	2023-03-09	1	-0/+1
\| \| \| \| \| \| \|	sh4 builds currently fail because of this: https://buildd.debian.org/status/fetch.php?pkg=passt&arch=sh4&ver=0.0%7Egit20230216.4663ccc-1&stamp=1677091350&raw=0 Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	Makefile, seccomp.sh: Fix cross-builds, adjust syscalls list to compiler	Stefano Brivio	2023-03-09	2	-3/+6
	Debian cross-building automatic checks: http://crossqa.debian.net/src/passt currently fail because we don't use the right target architecture and compiler while building the system call lists and resolving their numbers in seccomp.sh. Pass ARCH and CC to seccomp.sh and use them. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>