aboutgitcodebugslistschat
Commit message (Collapse)AuthorAgeFilesLines
* passt: Make process not dumpable after sandboxingStefano Brivio2022-02-211-0/+2
| | | | | | | | | | | | | Two effects: - ptrace() on passt and pasta can only be done by root, so that even if somebody gains access to the same user, they won't be able to check data passed in syscalls anyway. No core dumps allowed either - /proc/PID files are owned by root:root, and they can't be read by the same user as the one passt or pasta are running with Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt, pasta: Namespace-based sandboxing, defer seccomp policy applicationStefano Brivio2022-02-2117-262/+365
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To reach (at least) a conceptually equivalent security level as implemented by --enable-sandbox in slirp4netns, we need to create a new mount namespace and pivot_root() into a new (empty) mountpoint, so that passt and pasta can't access any filesystem resource after initialisation. While at it, also detach IPC, PID (only for passt, to prevent vulnerabilities based on the knowledge of a target PID), and UTS namespaces. With this approach, if we apply the seccomp filters right after the configuration step, the number of allowed syscalls grows further. To prevent this, defer the application of seccomp policies after the initialisation phase, before the main loop, that's where we expect bad things to happen, potentially. This way, we get back to 22 allowed syscalls for passt and 34 for pasta, on x86_64. While at it, move #syscalls notes to specific code paths wherever it conceptually makes sense. We have to open all the file handles we'll ever need before sandboxing: - the packet capture file can only be opened once, drop instance numbers from the default path and use the (pre-sandbox) PID instead - /proc/net/tcp{,v6} and /proc/net/udp{,v6}, for automatic detection of bound ports in pasta mode, are now opened only once, before sandboxing, and their handles are stored in the execution context - the UNIX domain socket for passt is also bound only once, before sandboxing: to reject clients after the first one, instead of closing the listening socket, keep it open, accept and immediately discard new connection if we already have a valid one Clarify the (unchanged) behaviour for --netns-only in the man page. To actually make passt and pasta processes run in a separate PID namespace, we need to unshare(CLONE_NEWPID) before forking to background (if configured to do so). Introduce a small daemon() implementation, __daemon(), that additionally saves the PID file before forking. While running in foreground, the process itself can't move to a new PID namespace (a process can't change the notion of its own PID): mention that in the man page. For some reason, fork() in a detached PID namespace causes SIGTERM and SIGQUIT to be ignored, even if the handler is still reported as SIG_DFL: add a signal handler that just exits. We can now drop most of the pasta_child_handler() implementation, that took care of terminating all processes running in the same namespace, if pasta started a shell: the shell itself is now the init process in that namespace, and all children will terminate once the init process exits. Issuing 'echo $$' in a detached PID namespace won't return the actual namespace PID as seen from the init namespace: adapt demo and test setup scripts to reflect that. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* slirp4netns: Add EXIT as condition for trapStefano Brivio2022-02-041-1/+1
| | | | | | | ...otherwise, we don't terminate pasta on regular exit, i.e. on a read from the "exit" file descriptor. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* slirp4netns: Look up pasta command, exit if not foundStefano Brivio2022-02-041-1/+6
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test/perf/passt_tcp: Drop iperf3 window size for host-to-guest testsStefano Brivio2022-02-011-10/+10
| | | | | | | | | | With a recent 5.15 kernel, passing a huge window size to iperf3 with lower MTUs makes iperf3 stop sending packets after a few seconds -- I haven't investigated this in detail, but the window size will be adjusted dynamically anyway and not passing it doesn't actually affect throughput, so simply drop the option. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* util: Avoid return of possibly truncated unsigned long in bitmap_isset()Stefano Brivio2022-02-011-2/+2
| | | | | | | | | | | | | Oops. If *word & BITMAP_BIT(bit) is bigger than an int (which is the case for half of the possible bits of a bitmap on 64-bit archs), we'll return that as an int, that is, zero, even if the bit at hand is set. Just return zero or one there, no callers are interested in the actual bitmap as return value. Issue found as pasta wouldn't automatically detect some bound ports. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test/distro/ubuntu: Skip apt-get update for 16.04 on powerpcStefano Brivio2022-01-301-1/+3
| | | | | | | | Some recent change to xenial-updates broke dependencies for gcc, it can't be installed anymore. Skipping apt-get update leaves gcc dependencies in a consistent state, though. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test/distro: Avoid race between display of ns_msg and netcat exitingStefano Brivio2022-01-304-0/+4
| | | | | | | | | The shell might report 'nc -6 -l -p 9999 > /tmp/ns_msg' as done even after the subsequent 'echo' is done: wait one second before reading out /tmp/ns_msg, to ensure we read that instead of the "Done" message. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Address new clang-tidy warnings from LLVM 13.0.1Stefano Brivio2022-01-309-25/+52
| | | | | | | | | | | | | | | | | | | | | | | | | clang-tidy from LLVM 13.0.1 reports some new warnings from these checkers: - altera-unroll-loops, altera-id-dependent-backward-branch: ignore for the moment being, add a TODO item - bugprone-easily-swappable-parameters: ignore, nothing to do about those - readability-function-cognitive-complexity: ignore for the moment being, add a TODO item - altera-struct-pack-align: ignore, alignment is forced in protocol headers - concurrency-mt-unsafe: ignore for the moment being, add a TODO item Fix bugprone-implicit-widening-of-multiplication-result warnings, though, that's doable and they seem to make sense. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* contrib: Introduce PoC for Kata Containers with user-mode networkingStefano Brivio2022-01-282-0/+764
| | | | | | | | | | | | | | | passt can be used to implement user-mode networking for the Kata Containers runtime, so that networking setup doesn't need elevated privileges or capabilities. This commit adds the patch for Kata Containers runtime and agent to support passt as networking model and endpoint, and some basic documentation. See contrib/kata-containers/README.md for more details and setup steps. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* libvirt, qemu: Move patches to new directory, contribStefano Brivio2022-01-283-0/+0
| | | | | | | | I'm about to add a new adaptation carrying out-of-tree patches for a Kata Containers PoC -- move the existing out-of-tree patches to their own directory to keep things easy to find in the main one. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* qemu: Rebase patch for UNIX domain socket support to latest upstreamStefano Brivio2022-01-281-45/+18
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt, tap: Daemonise once socket is ready without waiting for connectionStefano Brivio2022-01-284-21/+51
| | | | | | | | | | | | | The existing behaviour is not really practical: an automated agent in charge of starting both qemu and passt would need to fork itself to start passt, because passt won't fork to background until qemu connects, and the agent needs to unblock to start qemu. Instead of waiting for a connection to daemonise, do it right away as soon as a socket is available: that can be considered an initialised state already. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test: Add basic documentation about test suite, and cool-retro-term profileStefano Brivio2022-01-281-0/+104
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test: Add distribution tests for several architectures and kernel versionsStefano Brivio2022-01-286-2/+1299
| | | | | | | | | | | | | | | The new tests check build and a simple case with pasta sending a short message in both directions (namespace to init, init to namespace). Tests cover a mix of Debian, Fedora, OpenSUSE and Ubuntu combinations on aarch64, i386, ppc64, ppc64le, s390x, x86_64. Builds tested starting from approximately glibc 2.19, gcc 4.7, and actual functionality approximately from 4.4 kernels, glibc 2.25, gcc 4.8, all the way up to current glibc/gcc/kernel versions. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test/lib/test: Introduce 'def' directive for frequently used patternsStefano Brivio2022-01-281-229/+267
| | | | | | | | | For distribution tests, we'll repeat some tests frequently. Add a 'def' directive that starts a block, ended by 'endef', whose execution can then be triggered by simply giving its name as a directive itself. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test/lib/term: Allow for a wider variety of prompt characters in pane_wait()Stefano Brivio2022-01-281-7/+10
| | | | | | | | We might have highlighting and slightly different prompts across different distributions, allow a more reasonable set of prompt strings to be accepted as prompts. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* README: Fix link to IGMP/MLD proxy ticketStefano Brivio2022-01-281-1/+1
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* README: Fix anchor for Performance sectionStefano Brivio2022-01-271-1/+1
| | | | | | It shouldn't refer to the subsection under "Features". Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* conf: Fix support for --stderr as short option (-e)Stefano Brivio2022-01-271-1/+9
| | | | | | I forgot --stderr could also be -e, fix handling. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* hooks/pre-push: Delete old versions, add -DGLIBC_NO_STATIC_NSS, disable ↵Stefano Brivio2022-01-261-7/+11
| | | | | | legacy builds Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test/lib/video: Drop -preset ultrafast from ffmpeg argumentsStefano Brivio2022-01-261-1/+1
| | | | | | | It's not really needed on a reasonably powered CPU, and makes the video contents way less readable. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test/lib/setup: Don't rely on IFS to properly separate qemu argumentsStefano Brivio2022-01-261-32/+32
| | | | | | ...this gets needlessly annoying while playing with test cases. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* perf/passt_udp: Lower failure throughput thresholds with big MTUsStefano Brivio2022-01-261-4/+4
| | | | | | | The throughput results in this test look quite variable, slightly lower figures look reasonable anyway. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test/two_guests: Drop stray spaces after sleep directivesStefano Brivio2022-01-261-4/+4
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* demo/pasta: Don't wait for pasta to return to a promptStefano Brivio2022-01-261-1/+1
| | | | | | | Debug information might be printed after a prompt is seen, just wait those 3 seconds and be done with it. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* seccomp: Add a number of alternate and per-arch syscallsStefano Brivio2022-01-266-10/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Depending on the C library, but not necessarily in all the functions we use, statx() might be used instead of stat(), getdents() instead of getdents64(), readlinkat() instead of readlink(), openat() instead of open(). On aarch64, it's clone() and not fork(), and dup3() instead of dup2() -- just allow the existing alternative instead of dealing with per-arch selections. Since glibc commit 9a7565403758 ("posix: Consolidate fork implementation"), we need to allow set_robust_list() for fork()/clone(), even in a single-threaded context. On some architectures, epoll_pwait() is provided instead of epoll_wait(), but never both. Same with newfstat() and fstat(), sigreturn() and rt_sigreturn(), getdents64() and getdents(), readlink() and readlinkat(), unlink() and unlinkat(), whereas pipe() might not be available, but pipe2() always is, exclusively or not. Seen on Fedora 34: newfstatat() is used on top of fstat(). syslog() is an actual system call on some glibc/arch combinations, instead of a connect()/send() implementation. On ppc64 and ppc64le, _llseek(), recv(), send() and getuid() are used. For ppc64 only: ugetrlimit() for the getrlimit() implementation, plus sigreturn() and fcntl64(). On s390x, additionally, we need to allow socketcall() (on top of socket()), and sigreturn() also for passt (not just for pasta). Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp: Don't round down MSS to >= 64KiB page size, but clamp it in any caseStefano Brivio2022-01-261-3/+3
| | | | | | | On some architectures, the page size is bigger than the maximum size of an Ethernet frame. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* pasta: Check for zero d_reclen returned by getdents64() syscallStefano Brivio2022-01-261-1/+1
| | | | | | | | Seen on PPC with some older kernel versions: we seemingly have bytes left to read from the returned array of dirent structs, but d_reclen is zero: this, and all the subsequent entries, are not valid. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* netlink: Fix swapped v4/v6-only flags in external interface detectionStefano Brivio2022-01-261-2/+2
| | | | | | | The effect of this typo became visible in an IPv6-only environment, where passt wouldn't work at all. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp, udp, util: Fixes for bitmap handling on big-endian, castsStefano Brivio2022-01-265-7/+16
| | | | | | | | Bitmap manipulating functions would otherwise refer to inconsistent sets of bits on big-endian architectures. While at it, fix up a couple of casts. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* conf, pasta: Explicitly pass CLONE_{NEWUSER,NEWNET} to setns()Stefano Brivio2022-01-263-6/+8
| | | | | | | Only allow the intended types of namespaces to be joined via setns() as a defensive measure. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp, netlink, HAS{BYTES_ACKED,MIN_RTT,GETRANDOM} and NETLINK_GET_STRICT_CHKStefano Brivio2022-01-263-2/+58
| | | | | | | | | | | | | | | | tcpi_bytes_acked and tcpi_min_rtt are only available on recent kernel versions: provide fall-back paths (incurring some grade of performance penalty). Support for getrandom() was introduced in Linux 3.17 and glibc 2.25: provide an alternate mechanism for that as well, reading from /dev/random. Also check if NETLINK_GET_STRICT_CHK is defined before using it: it's not strictly needed, we'll filter out irrelevant results from netlink anyway. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* seccomp: Introduce mechanism to allow per-arch syscallsStefano Brivio2022-01-261-9/+48
| | | | | | | | | | | | | | | | Some C library functions are commonly implemented by different syscalls on different architectures. Add a mechanism to allow selected syscalls for a single architecture, syntax in #syscalls comment is: #syscalls <arch>:<name> e.g. s390x:socketcall, given that socketcall() is commonly used there instead of socket(). This is now implemented by a compiler probe for syscall numbers, auditd tools (ausyscall) are not required anymore as a result. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* util: Fall-back definitions for SECCOMP_RET_KILL_PROCESS, ETH_{MAX,MIN}_MTUStefano Brivio2022-01-261-0/+10
| | | | | | They're not available on some older toolchains. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* Makefile, seccomp: Fix build for i386, ppc64, ppc64leStefano Brivio2022-01-263-4/+18
| | | | | | | | | | | | | | | | | On some distributions, on ppc64, ulimit -s returns 'unlimited': add a reasonable default, and also make sure ulimit is invoked using the default shell, which should ensure ulimit is actually implemented. Also note that AUDIT_ARCH doesn't follow closely the naming reported by 'uname -m': convert for i386 and ppc as needed. While at it, move inclusion of seccomp.h after util.h, the former is less generic (cosmetic/clang-tidy only). Older kernel headers might lack a definition for AUDIT_ARCH_PPC64LE: define that explicitly if it's not available. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Drop <linux/ipv6.h> include, carry own ipv6hdr and opt_hdr definitionsStefano Brivio2022-01-2616-24/+28
| | | | | | | This is the only remaining Linux-specific include -- drop it to avoid clang-tidy warnings and to make code more portable. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tap, tcp: Fix two comparisons with different signedness reported by gcc 7Stefano Brivio2022-01-262-2/+2
| | | | | | For some reason, those are not reported by recent versions of gcc. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp: Cover all usages of tcpi_snd_wnd with HAS_SND_WNDStefano Brivio2022-01-261-0/+2
| | | | | | ...I forgot two of them. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* README: Feature list, links to lists, bugs, chatStefano Brivio2021-10-231-10/+122
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* README, perf_report: Markdown and CSS fixesStefano Brivio2021-10-222-28/+30
| | | | | | Updating md2html on the server needs a few adjustments. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* slirp4netns.sh: Fix up usage, exit 0 on --helpStefano Brivio2021-10-211-5/+6
| | | | | | | | | | Based on an original patch by Giuseppe Scrivano: there's no need to pass $0 to usage, drop that everywhere, and make it consistent. Don't exit with error on -h, --help. Suggested-by: Giuseppe Scrivano <gscrivan@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* seccomp: Add newfstatat to list of allowed syscallsStefano Brivio2021-10-211-1/+1
| | | | | | | ...it looks like, on a recent Fedora installation, daemon() uses it. Reported-by: Giuseppe Scrivano <gscrivan@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* netlink: Fix length of address attributeStefano Brivio2021-10-211-2/+2
| | | | | | | | ...I broke this while playing with clang-tidy, and didn't add tests for pasta's --config-net yet. Reported-by: Giuseppe Scrivano <gscrivan@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Fork into background also if not running from a terminalStefano Brivio2021-10-212-2/+2
| | | | | | | | | This is actually annoying: there's no way to make it fork into background when running from a script. However, it's always possible to keep it in foreground with -f. Make it simpler, and always fork into background if -f is not given. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test/two_guests: Fix sleep command for DADStefano Brivio2021-10-211-1/+2
| | | | | | | | | An inline comment prefixed by a space doesn't mean the space is dropped, and sleep(1) will get a blank in its argument. Move the comment on its own line. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test/two_guests: Let the guests end DAD before starting the DHCPv6 clientStefano Brivio2021-10-211-0/+1
| | | | | | | They'll start DAD as we bring up the interface, and the DHCPv6 client might be unreasonably delayed if we start it too early. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp: Restore source address to network endianness before using it for hash tableStefano Brivio2021-10-211-2/+3
| | | | | | | | | This was actually fine "on the wire", but it's inconsistent with the way we hash other addresses/protocols and also ends up with a wrong endianness in captures in case we replace the address with our default gateway. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* pcap: Fix failure check on write() in pcapm()Stefano Brivio2021-10-211-1/+2
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* ndp: Don't send a DNS search list if we don't have a list of DNS serversStefano Brivio2021-10-211-2/+3
| | | | | | | This is not explicitly forbidden, but it confuses the ISC's DHCP client, and doesn't make sense anyway. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>