passt - Plug A Simple Socket Transport

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	demo/pasta: Clean up before rebuilding with -g	Stefano Brivio	2022-03-01	1	-0/+1
\| \| \| \|	Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	arp, dhcp: Fix strict aliasing warnings reported by gcc 4.9 with -Ofast	Stefano Brivio	2022-02-28	2	-8/+9
\| \| \| \|	Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	passt, pasta: Run-time selection of AVX2 build	Stefano Brivio	2022-02-28	9	-48/+92
\| \| \| \| \| \| \| \| \| \| \| \| \|	Build-time selection of AVX2 flags and routines is not practical for distributions, but limiting AVX2 usage to checksum routines with specific run-time detection doesn't allow for easy performance gains from auto-vectorisation of batched packet handling routines. For x86_64, build non-AVX2 and AVX2 binaries, and implement a simple wrapper replacing the current executable with the AVX2 build if it's available, and if AVX2 is supported by the current CPU. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	test/distro/opensuse: Add Tumbleweed armv7l test	Stefano Brivio	2022-02-28	1	-0/+29
\| \| \| \|	Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	test/lib/term: Don't run demo when started as ./run	Stefano Brivio	2022-02-28	1	-1/+1
\| \| \| \| \| \| \|	I changed this in a previous commit by mistake, restore the original command. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	seccomp, tcp: Add fcntl64 to pasta syscalls for armv6l, armv7l	Stefano Brivio	2022-02-28	1	-1/+1
\| \| \| \|	Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	hooks/pre-push: Keep original cast on gzip, fix uploading with dash	Stefano Brivio	2022-02-28	1	-2/+3
\| \| \| \|	Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	demo/pasta: Exit namespace in 'ns' pane before restarting pasta	Stefano Brivio	2022-02-28	1	-0/+1
\| \| \| \|	Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	seccomp: Adjust list of allowed syscalls for armv6l, armv7l	Stefano Brivio	2022-02-26	3	-5/+9
\| \| \| \| \| \| \| \| \| \| \|	It looks like glibc commonly implements clock_gettime(2) with clock_gettime64(), and uses recv() instead of recvfrom(), send() instead of sendto(), and sigreturn() instead of rt_sigreturn() on armv6l and armv7l. Adjust the list of system calls for armv6l and armv7l accordingly. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	passt: Don't warn on failed madvise()	Stefano Brivio	2022-02-26	1	-2/+1
\| \| \| \| \| \| \| \|	A kernel might not be configured with CONFIG_TRANSPARENT_HUGEPAGE, especially on embedded systems. Ignore the error, it doesn't affect functionality. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	Makefile: Fix up AUDIT_ARCH for armv6l, armv7l	Stefano Brivio	2022-02-26	1	-0/+1
\| \| \| \| \| \| \|	There's a single AUDIT_ARCH_ARM define available (and big-endian shouldn't be a concern with those). Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	tap: Cast ETH_MAX_MTU to signed in comparisons	Stefano Brivio	2022-02-26	1	-2/+2
\| \| \| \| \| \|	At least gcc 8.3 and 10.2 emit a warning on armv6l and armv7l. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	seccomp.sh: Handle syscall number defines in the (x + y) form	Stefano Brivio	2022-02-26	1	-0/+3
\| \| \| \| \| \| \|	This is the case at least for current glibc headers on armv6l and armv7l. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	udp: Explicitly initialise sin6_scope_id and sin_zero in sockaddr_in{,6}	Stefano Brivio	2022-02-25	1	-0/+2
\| \| \| \| \| \| \|	Not functionally needed, but gcc versions 7 to 9 (at least) will issue a warning otherwise. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	passt: Explicitly check return value of chdir()	Stefano Brivio	2022-02-25	1	-1/+3
\| \| \| \| \| \| \| \|	...it doesn't actually matter as we're checking errno at the very end, but, depending on build flags, chdir() might be declared with warn_unused_result and the compiler issues a warning. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	hooks: Uploaded compressed .cast files too	Stefano Brivio	2022-02-23	1	-3/+6
\| \| \| \| \| \|	...to benefit from gzip_static in nginx or equivalent. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	passt.1: Drop duplicate --dns section	Stefano Brivio	2022-02-23	1	-11/+1
\| \| \| \|	Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	conf, ndp: Disable router advertisements on --config-net	Stefano Brivio	2022-02-23	2	-1/+6
\| \| \| \| \| \| \| \| \|	If we statically configure a default route, and also advertise it for SLAAC, the kernel will try moments later to add the same route: ICMPv6: RA: ndisc_router_discovery failed to add default route Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	netlink: Avoid left-over bytes in request on MTU configuration	Stefano Brivio	2022-02-23	1	-4/+7
\| \| \| \| \| \| \| \| \|	When nl_link() configures the MTU, it shouldn't send extra bytes, otherwise we'll get a kernel warning: netlink: 4 bytes leftover after parsing attributes in process `pasta'. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	test: Fix name of CI asciinema player in perf links handler	Stefano Brivio	2022-02-23	1	-5/+4
\| \| \| \|	Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	README: Fix demo div grid layout	Stefano Brivio	2022-02-23	1	-17/+23
\| \| \| \|	Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	passt: Drop PASST_LEGACY_NO_OPTIONS sections	Stefano Brivio	2022-02-22	4	-25/+0
\| \| \| \| \| \|	...nobody uses those builds anymore. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	demo, ci: Switch to asciinema(1) for terminal recordings	Stefano Brivio	2022-02-22	7	-108/+142
\| \| \| \| \| \| \| \| \| \|	For demos, cool-retro-term(1) looked fancier, but several threads of that and ffmpeg(1) are just messing up with performance testing. The CI videos started getting really big as well, and they were difficult to read. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	test: Add demo for Podman with pasta	Stefano Brivio	2022-02-22	7	-6/+959
\| \| \| \| \| \| \| \|	...showing setup steps, some peculiarities as --net option, and a general side-to-side comparison with slirp4netns(1), including "quick" TCP and UDP throughput and latency benchmarks. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	contrib: Add patch for Podman integration	Stefano Brivio	2022-02-21	1	-0/+542
\| \| \| \| \| \| \| \|	The patch introduces a "pasta" networking mode for rootless container, similar to the existing slirp4netns mode. Notable differences are described in the commit message. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	README, hooks: Build HTML man page on push, add a link	Stefano Brivio	2022-02-21	2	-0/+5
\| \| \| \|	Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	man page: Update REPORTING BUGS section	Stefano Brivio	2022-02-21	1	-4/+5
\| \| \| \|	Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	test/perf/passt_udp: Drop threshold for 256B test	Stefano Brivio	2022-02-21	1	-2/+2
\| \| \| \| \| \| \| \| \|	That test fails sometimes, it looks like iperf3 is still sending initial messages that are too big. I'll need to figure out why, but given that 256 bytes is not really an expected MTU, drop the thresholds to zero for the moment being. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	test/distro/ubuntu: Use DEBIAN_FRONTEND=noninteractive for apt on 22.04	Stefano Brivio	2022-02-21	1	-0/+1
\| \| \| \| \| \| \| \| \|	Removing the needrestart package doesn't seem to work anymore, and I'm getting again prompts to restart services after installing gcc and make: export DEBIAN_FRONTEND=noninteractive before installing packages to avoid that. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	pasta: By default, quit if filesystem-bound net namespace goes away	Stefano Brivio	2022-02-21	7	-12/+107
\| \| \| \| \| \| \| \| \| \| \| \|	This should be convenient for users managing filesystem-bound network namespaces: monitor the base directory of the namespace and exit if the namespace given as PATH or NAME target is deleted. We can't add an inotify watch directly on the namespace directory, that won't work with nsfs. Add an option to disable this behaviour, --no-netns-quit. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	tcp, udp: Receive batching doesn't pay off when writing single frames to tap	Stefano Brivio	2022-02-21	2	-32/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In pasta mode, when we get data from sockets and write it as single frames to the tap device, we batch receive operations considerably, and then (conceptually) split the data in many smaller writes. It looked like an obvious choice, but performance is actually better if we receive data in many small frame-sized recvmsg()/recvmmsg(). The syscall overhead with the previous behaviour, observed by perf, comes predominantly from write operations, but receiving data in shorter chunks probably improves cache locality by a considerable amount. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	udp: Allow loopback connections from host using configured unicast address	Stefano Brivio	2022-02-21	1	-2/+18
\| \| \| \| \| \| \| \| \| \| \| \|	Likely for testing purposes only: allow connections from host to guest or namespace using, as connection target, the configured, possibly global unicast address. In this case, we have to map the destination address to a link-local address, and for port-based tracked responses, the source address needs to be again the unicast address: not loopback, not link-local. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	conf, udp: Introduce basic DNS forwarding	Stefano Brivio	2022-02-21	8	-38/+177
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For compatibility with libslirp/slirp4netns users: introduce a mechanism to map, in the UDP routines, an address facing guest or namespace to the first IPv4 or IPv6 address resulting from configuration as resolver. This can be enabled with the new --dns-forward option. This implies that sourcing and using DNS addresses and search lists, passed via command line or read from /etc/resolv.conf, is not bound anymore to DHCP/DHCPv6/NDP usage: for example, pasta users might just want to use addresses from /etc/resolv.conf as mapping target, while not passing DNS options via DHCP. Reflect this in all the involved code paths by differentiating DHCP/DHCPv6/NDP usage from DNS configuration per se, and in the new options --dhcp-dns, --dhcp-search for pasta, and --no-dhcp-dns, --no-dhcp-search for passt. This should be the last bit to enable substantial compatibility between slirp4netns.sh and slirp4netns(1): pass the --dns-forward option from the script too. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	conf: Given IPv4 address and no netmask, assign RFC 790-style classes	Stefano Brivio	2022-02-21	1	-10/+10
\| \| \| \| \| \| \|	Provide a sane default, instead of /0, if an address is given, and it doesn't correspond to any host address we could find via netlink. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	conf: Don't print configuration on --quiet	Stefano Brivio	2022-02-21	1	-1/+2
\| \| \| \|	Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	slirp4netns.sh: Implement API socket option for port forwarding	Stefano Brivio	2022-02-21	1	-7/+182
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce the equivalent of the --api-socket option from slirp4netns: spawn a subshell to handle requests, netcat binds to a UNIX domain socket and jq parses messages. Three minor differences compared to slirp4netns: - IPv6 ports are forwarded too - error messages are not as specific, for example we don't tell apart malformed JSON requests from invalid parameters - host addresses are always 0.0.0.0 and ::1, pasta doesn't bind on specific addresses for different ports Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	Makefile, conf, passt: Drop passt4netns references, explicit argc check	Stefano Brivio	2022-02-21	3	-13/+15
\| \| \| \| \| \| \| \| \| \| \|	Nobody currently calls this as passt4netns, that was the name I used before 'pasta', drop any reference before it's too late. While at it, explicitly check that argc is bigger than or equal to one, just as a defensive measure: argv[0] being NULL is not an issue anyway. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	passt: Make process not dumpable after sandboxing	Stefano Brivio	2022-02-21	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	Two effects: - ptrace() on passt and pasta can only be done by root, so that even if somebody gains access to the same user, they won't be able to check data passed in syscalls anyway. No core dumps allowed either - /proc/PID files are owned by root:root, and they can't be read by the same user as the one passt or pasta are running with Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	passt, pasta: Namespace-based sandboxing, defer seccomp policy application	Stefano Brivio	2022-02-21	17	-262/+365
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To reach (at least) a conceptually equivalent security level as implemented by --enable-sandbox in slirp4netns, we need to create a new mount namespace and pivot_root() into a new (empty) mountpoint, so that passt and pasta can't access any filesystem resource after initialisation. While at it, also detach IPC, PID (only for passt, to prevent vulnerabilities based on the knowledge of a target PID), and UTS namespaces. With this approach, if we apply the seccomp filters right after the configuration step, the number of allowed syscalls grows further. To prevent this, defer the application of seccomp policies after the initialisation phase, before the main loop, that's where we expect bad things to happen, potentially. This way, we get back to 22 allowed syscalls for passt and 34 for pasta, on x86_64. While at it, move #syscalls notes to specific code paths wherever it conceptually makes sense. We have to open all the file handles we'll ever need before sandboxing: - the packet capture file can only be opened once, drop instance numbers from the default path and use the (pre-sandbox) PID instead - /proc/net/tcp{,v6} and /proc/net/udp{,v6}, for automatic detection of bound ports in pasta mode, are now opened only once, before sandboxing, and their handles are stored in the execution context - the UNIX domain socket for passt is also bound only once, before sandboxing: to reject clients after the first one, instead of closing the listening socket, keep it open, accept and immediately discard new connection if we already have a valid one Clarify the (unchanged) behaviour for --netns-only in the man page. To actually make passt and pasta processes run in a separate PID namespace, we need to unshare(CLONE_NEWPID) before forking to background (if configured to do so). Introduce a small daemon() implementation, __daemon(), that additionally saves the PID file before forking. While running in foreground, the process itself can't move to a new PID namespace (a process can't change the notion of its own PID): mention that in the man page. For some reason, fork() in a detached PID namespace causes SIGTERM and SIGQUIT to be ignored, even if the handler is still reported as SIG_DFL: add a signal handler that just exits. We can now drop most of the pasta_child_handler() implementation, that took care of terminating all processes running in the same namespace, if pasta started a shell: the shell itself is now the init process in that namespace, and all children will terminate once the init process exits. Issuing 'echo $$' in a detached PID namespace won't return the actual namespace PID as seen from the init namespace: adapt demo and test setup scripts to reflect that. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	slirp4netns: Add EXIT as condition for trap	Stefano Brivio	2022-02-04	1	-1/+1
\| \| \| \| \| \| \|	...otherwise, we don't terminate pasta on regular exit, i.e. on a read from the "exit" file descriptor. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	slirp4netns: Look up pasta command, exit if not found	Stefano Brivio	2022-02-04	1	-1/+6
\| \| \| \|	Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	test/perf/passt_tcp: Drop iperf3 window size for host-to-guest tests	Stefano Brivio	2022-02-01	1	-10/+10
\| \| \| \| \| \| \| \| \| \|	With a recent 5.15 kernel, passing a huge window size to iperf3 with lower MTUs makes iperf3 stop sending packets after a few seconds -- I haven't investigated this in detail, but the window size will be adjusted dynamically anyway and not passing it doesn't actually affect throughput, so simply drop the option. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	util: Avoid return of possibly truncated unsigned long in bitmap_isset()	Stefano Brivio	2022-02-01	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	Oops. If *word & BITMAP_BIT(bit) is bigger than an int (which is the case for half of the possible bits of a bitmap on 64-bit archs), we'll return that as an int, that is, zero, even if the bit at hand is set. Just return zero or one there, no callers are interested in the actual bitmap as return value. Issue found as pasta wouldn't automatically detect some bound ports. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	test/distro/ubuntu: Skip apt-get update for 16.04 on powerpc	Stefano Brivio	2022-01-30	1	-1/+3
\| \| \| \| \| \| \| \|	Some recent change to xenial-updates broke dependencies for gcc, it can't be installed anymore. Skipping apt-get update leaves gcc dependencies in a consistent state, though. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	test/distro: Avoid race between display of ns_msg and netcat exiting	Stefano Brivio	2022-01-30	4	-0/+4
\| \| \| \| \| \| \| \| \|	The shell might report 'nc -6 -l -p 9999 > /tmp/ns_msg' as done even after the subsequent 'echo' is done: wait one second before reading out /tmp/ns_msg, to ensure we read that instead of the "Done" message. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	passt: Address new clang-tidy warnings from LLVM 13.0.1	Stefano Brivio	2022-01-30	9	-25/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	clang-tidy from LLVM 13.0.1 reports some new warnings from these checkers: - altera-unroll-loops, altera-id-dependent-backward-branch: ignore for the moment being, add a TODO item - bugprone-easily-swappable-parameters: ignore, nothing to do about those - readability-function-cognitive-complexity: ignore for the moment being, add a TODO item - altera-struct-pack-align: ignore, alignment is forced in protocol headers - concurrency-mt-unsafe: ignore for the moment being, add a TODO item Fix bugprone-implicit-widening-of-multiplication-result warnings, though, that's doable and they seem to make sense. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	contrib: Introduce PoC for Kata Containers with user-mode networking	Stefano Brivio	2022-01-28	2	-0/+764
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	passt can be used to implement user-mode networking for the Kata Containers runtime, so that networking setup doesn't need elevated privileges or capabilities. This commit adds the patch for Kata Containers runtime and agent to support passt as networking model and endpoint, and some basic documentation. See contrib/kata-containers/README.md for more details and setup steps. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	libvirt, qemu: Move patches to new directory, contrib	Stefano Brivio	2022-01-28	3	-0/+0
\| \| \| \| \| \| \| \|	I'm about to add a new adaptation carrying out-of-tree patches for a Kata Containers PoC -- move the existing out-of-tree patches to their own directory to keep things easy to find in the main one. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	qemu: Rebase patch for UNIX domain socket support to latest upstream	Stefano Brivio	2022-01-28	1	-45/+18
\| \| \| \|	Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	passt, tap: Daemonise once socket is ready without waiting for connection	Stefano Brivio	2022-01-28	4	-21/+51
\| \| \| \| \| \| \| \| \| \| \| \| \|	The existing behaviour is not really practical: an automated agent in charge of starting both qemu and passt would need to fork itself to start passt, because passt won't fork to background until qemu connects, and the agent needs to unblock to start qemu. Instead of waiting for a connection to daemonise, do it right away as soon as a socket is available: that can be considered an initialised state already. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>