aboutgitcodebugslistschat
Commit message (Collapse)AuthorAgeFilesLines
...
* demo, ci: Switch to asciinema(1) for terminal recordingsStefano Brivio2022-02-227-108/+142
| | | | | | | | | | For demos, cool-retro-term(1) looked fancier, but several threads of that and ffmpeg(1) are just messing up with performance testing. The CI videos started getting really big as well, and they were difficult to read. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test: Add demo for Podman with pastaStefano Brivio2022-02-227-6/+959
| | | | | | | | ...showing setup steps, some peculiarities as --net option, and a general side-to-side comparison with slirp4netns(1), including "quick" TCP and UDP throughput and latency benchmarks. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* contrib: Add patch for Podman integrationStefano Brivio2022-02-211-0/+542
| | | | | | | | The patch introduces a "pasta" networking mode for rootless container, similar to the existing slirp4netns mode. Notable differences are described in the commit message. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* README, hooks: Build HTML man page on push, add a linkStefano Brivio2022-02-212-0/+5
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* man page: Update REPORTING BUGS sectionStefano Brivio2022-02-211-4/+5
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test/perf/passt_udp: Drop threshold for 256B testStefano Brivio2022-02-211-2/+2
| | | | | | | | | That test fails sometimes, it looks like iperf3 is still sending initial messages that are too big. I'll need to figure out why, but given that 256 bytes is not really an expected MTU, drop the thresholds to zero for the moment being. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test/distro/ubuntu: Use DEBIAN_FRONTEND=noninteractive for apt on 22.04Stefano Brivio2022-02-211-0/+1
| | | | | | | | | Removing the needrestart package doesn't seem to work anymore, and I'm getting again prompts to restart services after installing gcc and make: export DEBIAN_FRONTEND=noninteractive before installing packages to avoid that. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* pasta: By default, quit if filesystem-bound net namespace goes awayStefano Brivio2022-02-217-12/+107
| | | | | | | | | | | | This should be convenient for users managing filesystem-bound network namespaces: monitor the base directory of the namespace and exit if the namespace given as PATH or NAME target is deleted. We can't add an inotify watch directly on the namespace directory, that won't work with nsfs. Add an option to disable this behaviour, --no-netns-quit. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp, udp: Receive batching doesn't pay off when writing single frames to tapStefano Brivio2022-02-212-32/+37
| | | | | | | | | | | | | | | | In pasta mode, when we get data from sockets and write it as single frames to the tap device, we batch receive operations considerably, and then (conceptually) split the data in many smaller writes. It looked like an obvious choice, but performance is actually better if we receive data in many small frame-sized recvmsg()/recvmmsg(). The syscall overhead with the previous behaviour, observed by perf, comes predominantly from write operations, but receiving data in shorter chunks probably improves cache locality by a considerable amount. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* udp: Allow loopback connections from host using configured unicast addressStefano Brivio2022-02-211-2/+18
| | | | | | | | | | | | Likely for testing purposes only: allow connections from host to guest or namespace using, as connection target, the configured, possibly global unicast address. In this case, we have to map the destination address to a link-local address, and for port-based tracked responses, the source address needs to be again the unicast address: not loopback, not link-local. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* conf, udp: Introduce basic DNS forwardingStefano Brivio2022-02-218-38/+177
| | | | | | | | | | | | | | | | | | | | | | | | | For compatibility with libslirp/slirp4netns users: introduce a mechanism to map, in the UDP routines, an address facing guest or namespace to the first IPv4 or IPv6 address resulting from configuration as resolver. This can be enabled with the new --dns-forward option. This implies that sourcing and using DNS addresses and search lists, passed via command line or read from /etc/resolv.conf, is not bound anymore to DHCP/DHCPv6/NDP usage: for example, pasta users might just want to use addresses from /etc/resolv.conf as mapping target, while not passing DNS options via DHCP. Reflect this in all the involved code paths by differentiating DHCP/DHCPv6/NDP usage from DNS configuration per se, and in the new options --dhcp-dns, --dhcp-search for pasta, and --no-dhcp-dns, --no-dhcp-search for passt. This should be the last bit to enable substantial compatibility between slirp4netns.sh and slirp4netns(1): pass the --dns-forward option from the script too. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* conf: Given IPv4 address and no netmask, assign RFC 790-style classesStefano Brivio2022-02-211-10/+10
| | | | | | | Provide a sane default, instead of /0, if an address is given, and it doesn't correspond to any host address we could find via netlink. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* conf: Don't print configuration on --quietStefano Brivio2022-02-211-1/+2
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* slirp4netns.sh: Implement API socket option for port forwardingStefano Brivio2022-02-211-7/+182
| | | | | | | | | | | | | | | | | | Introduce the equivalent of the --api-socket option from slirp4netns: spawn a subshell to handle requests, netcat binds to a UNIX domain socket and jq parses messages. Three minor differences compared to slirp4netns: - IPv6 ports are forwarded too - error messages are not as specific, for example we don't tell apart malformed JSON requests from invalid parameters - host addresses are always 0.0.0.0 and ::1, pasta doesn't bind on specific addresses for different ports Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* Makefile, conf, passt: Drop passt4netns references, explicit argc checkStefano Brivio2022-02-213-13/+15
| | | | | | | | | | | Nobody currently calls this as passt4netns, that was the name I used before 'pasta', drop any reference before it's too late. While at it, explicitly check that argc is bigger than or equal to one, just as a defensive measure: argv[0] being NULL is not an issue anyway. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Make process not dumpable after sandboxingStefano Brivio2022-02-211-0/+2
| | | | | | | | | | | | | Two effects: - ptrace() on passt and pasta can only be done by root, so that even if somebody gains access to the same user, they won't be able to check data passed in syscalls anyway. No core dumps allowed either - /proc/PID files are owned by root:root, and they can't be read by the same user as the one passt or pasta are running with Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt, pasta: Namespace-based sandboxing, defer seccomp policy applicationStefano Brivio2022-02-2117-262/+365
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To reach (at least) a conceptually equivalent security level as implemented by --enable-sandbox in slirp4netns, we need to create a new mount namespace and pivot_root() into a new (empty) mountpoint, so that passt and pasta can't access any filesystem resource after initialisation. While at it, also detach IPC, PID (only for passt, to prevent vulnerabilities based on the knowledge of a target PID), and UTS namespaces. With this approach, if we apply the seccomp filters right after the configuration step, the number of allowed syscalls grows further. To prevent this, defer the application of seccomp policies after the initialisation phase, before the main loop, that's where we expect bad things to happen, potentially. This way, we get back to 22 allowed syscalls for passt and 34 for pasta, on x86_64. While at it, move #syscalls notes to specific code paths wherever it conceptually makes sense. We have to open all the file handles we'll ever need before sandboxing: - the packet capture file can only be opened once, drop instance numbers from the default path and use the (pre-sandbox) PID instead - /proc/net/tcp{,v6} and /proc/net/udp{,v6}, for automatic detection of bound ports in pasta mode, are now opened only once, before sandboxing, and their handles are stored in the execution context - the UNIX domain socket for passt is also bound only once, before sandboxing: to reject clients after the first one, instead of closing the listening socket, keep it open, accept and immediately discard new connection if we already have a valid one Clarify the (unchanged) behaviour for --netns-only in the man page. To actually make passt and pasta processes run in a separate PID namespace, we need to unshare(CLONE_NEWPID) before forking to background (if configured to do so). Introduce a small daemon() implementation, __daemon(), that additionally saves the PID file before forking. While running in foreground, the process itself can't move to a new PID namespace (a process can't change the notion of its own PID): mention that in the man page. For some reason, fork() in a detached PID namespace causes SIGTERM and SIGQUIT to be ignored, even if the handler is still reported as SIG_DFL: add a signal handler that just exits. We can now drop most of the pasta_child_handler() implementation, that took care of terminating all processes running in the same namespace, if pasta started a shell: the shell itself is now the init process in that namespace, and all children will terminate once the init process exits. Issuing 'echo $$' in a detached PID namespace won't return the actual namespace PID as seen from the init namespace: adapt demo and test setup scripts to reflect that. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* slirp4netns: Add EXIT as condition for trapStefano Brivio2022-02-041-1/+1
| | | | | | | ...otherwise, we don't terminate pasta on regular exit, i.e. on a read from the "exit" file descriptor. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* slirp4netns: Look up pasta command, exit if not foundStefano Brivio2022-02-041-1/+6
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test/perf/passt_tcp: Drop iperf3 window size for host-to-guest testsStefano Brivio2022-02-011-10/+10
| | | | | | | | | | With a recent 5.15 kernel, passing a huge window size to iperf3 with lower MTUs makes iperf3 stop sending packets after a few seconds -- I haven't investigated this in detail, but the window size will be adjusted dynamically anyway and not passing it doesn't actually affect throughput, so simply drop the option. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* util: Avoid return of possibly truncated unsigned long in bitmap_isset()Stefano Brivio2022-02-011-2/+2
| | | | | | | | | | | | | Oops. If *word & BITMAP_BIT(bit) is bigger than an int (which is the case for half of the possible bits of a bitmap on 64-bit archs), we'll return that as an int, that is, zero, even if the bit at hand is set. Just return zero or one there, no callers are interested in the actual bitmap as return value. Issue found as pasta wouldn't automatically detect some bound ports. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test/distro/ubuntu: Skip apt-get update for 16.04 on powerpcStefano Brivio2022-01-301-1/+3
| | | | | | | | Some recent change to xenial-updates broke dependencies for gcc, it can't be installed anymore. Skipping apt-get update leaves gcc dependencies in a consistent state, though. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test/distro: Avoid race between display of ns_msg and netcat exitingStefano Brivio2022-01-304-0/+4
| | | | | | | | | The shell might report 'nc -6 -l -p 9999 > /tmp/ns_msg' as done even after the subsequent 'echo' is done: wait one second before reading out /tmp/ns_msg, to ensure we read that instead of the "Done" message. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Address new clang-tidy warnings from LLVM 13.0.1Stefano Brivio2022-01-309-25/+52
| | | | | | | | | | | | | | | | | | | | | | | | | clang-tidy from LLVM 13.0.1 reports some new warnings from these checkers: - altera-unroll-loops, altera-id-dependent-backward-branch: ignore for the moment being, add a TODO item - bugprone-easily-swappable-parameters: ignore, nothing to do about those - readability-function-cognitive-complexity: ignore for the moment being, add a TODO item - altera-struct-pack-align: ignore, alignment is forced in protocol headers - concurrency-mt-unsafe: ignore for the moment being, add a TODO item Fix bugprone-implicit-widening-of-multiplication-result warnings, though, that's doable and they seem to make sense. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* contrib: Introduce PoC for Kata Containers with user-mode networkingStefano Brivio2022-01-282-0/+764
| | | | | | | | | | | | | | | passt can be used to implement user-mode networking for the Kata Containers runtime, so that networking setup doesn't need elevated privileges or capabilities. This commit adds the patch for Kata Containers runtime and agent to support passt as networking model and endpoint, and some basic documentation. See contrib/kata-containers/README.md for more details and setup steps. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* libvirt, qemu: Move patches to new directory, contribStefano Brivio2022-01-283-0/+0
| | | | | | | | I'm about to add a new adaptation carrying out-of-tree patches for a Kata Containers PoC -- move the existing out-of-tree patches to their own directory to keep things easy to find in the main one. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* qemu: Rebase patch for UNIX domain socket support to latest upstreamStefano Brivio2022-01-281-45/+18
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt, tap: Daemonise once socket is ready without waiting for connectionStefano Brivio2022-01-284-21/+51
| | | | | | | | | | | | | The existing behaviour is not really practical: an automated agent in charge of starting both qemu and passt would need to fork itself to start passt, because passt won't fork to background until qemu connects, and the agent needs to unblock to start qemu. Instead of waiting for a connection to daemonise, do it right away as soon as a socket is available: that can be considered an initialised state already. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test: Add basic documentation about test suite, and cool-retro-term profileStefano Brivio2022-01-281-0/+104
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test: Add distribution tests for several architectures and kernel versionsStefano Brivio2022-01-286-2/+1299
| | | | | | | | | | | | | | | The new tests check build and a simple case with pasta sending a short message in both directions (namespace to init, init to namespace). Tests cover a mix of Debian, Fedora, OpenSUSE and Ubuntu combinations on aarch64, i386, ppc64, ppc64le, s390x, x86_64. Builds tested starting from approximately glibc 2.19, gcc 4.7, and actual functionality approximately from 4.4 kernels, glibc 2.25, gcc 4.8, all the way up to current glibc/gcc/kernel versions. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test/lib/test: Introduce 'def' directive for frequently used patternsStefano Brivio2022-01-281-229/+267
| | | | | | | | | For distribution tests, we'll repeat some tests frequently. Add a 'def' directive that starts a block, ended by 'endef', whose execution can then be triggered by simply giving its name as a directive itself. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test/lib/term: Allow for a wider variety of prompt characters in pane_wait()Stefano Brivio2022-01-281-7/+10
| | | | | | | | We might have highlighting and slightly different prompts across different distributions, allow a more reasonable set of prompt strings to be accepted as prompts. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* README: Fix link to IGMP/MLD proxy ticketStefano Brivio2022-01-281-1/+1
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* README: Fix anchor for Performance sectionStefano Brivio2022-01-271-1/+1
| | | | | | It shouldn't refer to the subsection under "Features". Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* conf: Fix support for --stderr as short option (-e)Stefano Brivio2022-01-271-1/+9
| | | | | | I forgot --stderr could also be -e, fix handling. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* hooks/pre-push: Delete old versions, add -DGLIBC_NO_STATIC_NSS, disable ↵Stefano Brivio2022-01-261-7/+11
| | | | | | legacy builds Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test/lib/video: Drop -preset ultrafast from ffmpeg argumentsStefano Brivio2022-01-261-1/+1
| | | | | | | It's not really needed on a reasonably powered CPU, and makes the video contents way less readable. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test/lib/setup: Don't rely on IFS to properly separate qemu argumentsStefano Brivio2022-01-261-32/+32
| | | | | | ...this gets needlessly annoying while playing with test cases. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* perf/passt_udp: Lower failure throughput thresholds with big MTUsStefano Brivio2022-01-261-4/+4
| | | | | | | The throughput results in this test look quite variable, slightly lower figures look reasonable anyway. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test/two_guests: Drop stray spaces after sleep directivesStefano Brivio2022-01-261-4/+4
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* demo/pasta: Don't wait for pasta to return to a promptStefano Brivio2022-01-261-1/+1
| | | | | | | Debug information might be printed after a prompt is seen, just wait those 3 seconds and be done with it. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* seccomp: Add a number of alternate and per-arch syscallsStefano Brivio2022-01-266-10/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Depending on the C library, but not necessarily in all the functions we use, statx() might be used instead of stat(), getdents() instead of getdents64(), readlinkat() instead of readlink(), openat() instead of open(). On aarch64, it's clone() and not fork(), and dup3() instead of dup2() -- just allow the existing alternative instead of dealing with per-arch selections. Since glibc commit 9a7565403758 ("posix: Consolidate fork implementation"), we need to allow set_robust_list() for fork()/clone(), even in a single-threaded context. On some architectures, epoll_pwait() is provided instead of epoll_wait(), but never both. Same with newfstat() and fstat(), sigreturn() and rt_sigreturn(), getdents64() and getdents(), readlink() and readlinkat(), unlink() and unlinkat(), whereas pipe() might not be available, but pipe2() always is, exclusively or not. Seen on Fedora 34: newfstatat() is used on top of fstat(). syslog() is an actual system call on some glibc/arch combinations, instead of a connect()/send() implementation. On ppc64 and ppc64le, _llseek(), recv(), send() and getuid() are used. For ppc64 only: ugetrlimit() for the getrlimit() implementation, plus sigreturn() and fcntl64(). On s390x, additionally, we need to allow socketcall() (on top of socket()), and sigreturn() also for passt (not just for pasta). Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp: Don't round down MSS to >= 64KiB page size, but clamp it in any caseStefano Brivio2022-01-261-3/+3
| | | | | | | On some architectures, the page size is bigger than the maximum size of an Ethernet frame. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* pasta: Check for zero d_reclen returned by getdents64() syscallStefano Brivio2022-01-261-1/+1
| | | | | | | | Seen on PPC with some older kernel versions: we seemingly have bytes left to read from the returned array of dirent structs, but d_reclen is zero: this, and all the subsequent entries, are not valid. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* netlink: Fix swapped v4/v6-only flags in external interface detectionStefano Brivio2022-01-261-2/+2
| | | | | | | The effect of this typo became visible in an IPv6-only environment, where passt wouldn't work at all. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp, udp, util: Fixes for bitmap handling on big-endian, castsStefano Brivio2022-01-265-7/+16
| | | | | | | | Bitmap manipulating functions would otherwise refer to inconsistent sets of bits on big-endian architectures. While at it, fix up a couple of casts. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* conf, pasta: Explicitly pass CLONE_{NEWUSER,NEWNET} to setns()Stefano Brivio2022-01-263-6/+8
| | | | | | | Only allow the intended types of namespaces to be joined via setns() as a defensive measure. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp, netlink, HAS{BYTES_ACKED,MIN_RTT,GETRANDOM} and NETLINK_GET_STRICT_CHKStefano Brivio2022-01-263-2/+58
| | | | | | | | | | | | | | | | tcpi_bytes_acked and tcpi_min_rtt are only available on recent kernel versions: provide fall-back paths (incurring some grade of performance penalty). Support for getrandom() was introduced in Linux 3.17 and glibc 2.25: provide an alternate mechanism for that as well, reading from /dev/random. Also check if NETLINK_GET_STRICT_CHK is defined before using it: it's not strictly needed, we'll filter out irrelevant results from netlink anyway. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* seccomp: Introduce mechanism to allow per-arch syscallsStefano Brivio2022-01-261-9/+48
| | | | | | | | | | | | | | | | Some C library functions are commonly implemented by different syscalls on different architectures. Add a mechanism to allow selected syscalls for a single architecture, syntax in #syscalls comment is: #syscalls <arch>:<name> e.g. s390x:socketcall, given that socketcall() is commonly used there instead of socket(). This is now implemented by a compiler probe for syscall numbers, auditd tools (ausyscall) are not required anymore as a result. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* util: Fall-back definitions for SECCOMP_RET_KILL_PROCESS, ETH_{MAX,MIN}_MTUStefano Brivio2022-01-261-0/+10
| | | | | | They're not available on some older toolchains. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>