aboutgitcodebugslistschat
path: root/pasta.c
Commit message (Collapse)AuthorAgeFilesLines
...
* treewide: Fix android-cloexec-* clang-tidy warnings, re-enable checksStefano Brivio2022-03-291-7/+9
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* seccomp: Adjust list of allowed syscalls for armv6l, armv7lStefano Brivio2022-02-261-1/+2
| | | | | | | | | | | It looks like glibc commonly implements clock_gettime(2) with clock_gettime64(), and uses recv() instead of recvfrom(), send() instead of sendto(), and sigreturn() instead of rt_sigreturn() on armv6l and armv7l. Adjust the list of system calls for armv6l and armv7l accordingly. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* pasta: By default, quit if filesystem-bound net namespace goes awayStefano Brivio2022-02-211-0/+52
| | | | | | | | | | | | This should be convenient for users managing filesystem-bound network namespaces: monitor the base directory of the namespace and exit if the namespace given as PATH or NAME target is deleted. We can't add an inotify watch directly on the namespace directory, that won't work with nsfs. Add an option to disable this behaviour, --no-netns-quit. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt, pasta: Namespace-based sandboxing, defer seccomp policy applicationStefano Brivio2022-02-211-106/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To reach (at least) a conceptually equivalent security level as implemented by --enable-sandbox in slirp4netns, we need to create a new mount namespace and pivot_root() into a new (empty) mountpoint, so that passt and pasta can't access any filesystem resource after initialisation. While at it, also detach IPC, PID (only for passt, to prevent vulnerabilities based on the knowledge of a target PID), and UTS namespaces. With this approach, if we apply the seccomp filters right after the configuration step, the number of allowed syscalls grows further. To prevent this, defer the application of seccomp policies after the initialisation phase, before the main loop, that's where we expect bad things to happen, potentially. This way, we get back to 22 allowed syscalls for passt and 34 for pasta, on x86_64. While at it, move #syscalls notes to specific code paths wherever it conceptually makes sense. We have to open all the file handles we'll ever need before sandboxing: - the packet capture file can only be opened once, drop instance numbers from the default path and use the (pre-sandbox) PID instead - /proc/net/tcp{,v6} and /proc/net/udp{,v6}, for automatic detection of bound ports in pasta mode, are now opened only once, before sandboxing, and their handles are stored in the execution context - the UNIX domain socket for passt is also bound only once, before sandboxing: to reject clients after the first one, instead of closing the listening socket, keep it open, accept and immediately discard new connection if we already have a valid one Clarify the (unchanged) behaviour for --netns-only in the man page. To actually make passt and pasta processes run in a separate PID namespace, we need to unshare(CLONE_NEWPID) before forking to background (if configured to do so). Introduce a small daemon() implementation, __daemon(), that additionally saves the PID file before forking. While running in foreground, the process itself can't move to a new PID namespace (a process can't change the notion of its own PID): mention that in the man page. For some reason, fork() in a detached PID namespace causes SIGTERM and SIGQUIT to be ignored, even if the handler is still reported as SIG_DFL: add a signal handler that just exits. We can now drop most of the pasta_child_handler() implementation, that took care of terminating all processes running in the same namespace, if pasta started a shell: the shell itself is now the init process in that namespace, and all children will terminate once the init process exits. Issuing 'echo $$' in a detached PID namespace won't return the actual namespace PID as seen from the init namespace: adapt demo and test setup scripts to reflect that. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* seccomp: Add a number of alternate and per-arch syscallsStefano Brivio2022-01-261-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Depending on the C library, but not necessarily in all the functions we use, statx() might be used instead of stat(), getdents() instead of getdents64(), readlinkat() instead of readlink(), openat() instead of open(). On aarch64, it's clone() and not fork(), and dup3() instead of dup2() -- just allow the existing alternative instead of dealing with per-arch selections. Since glibc commit 9a7565403758 ("posix: Consolidate fork implementation"), we need to allow set_robust_list() for fork()/clone(), even in a single-threaded context. On some architectures, epoll_pwait() is provided instead of epoll_wait(), but never both. Same with newfstat() and fstat(), sigreturn() and rt_sigreturn(), getdents64() and getdents(), readlink() and readlinkat(), unlink() and unlinkat(), whereas pipe() might not be available, but pipe2() always is, exclusively or not. Seen on Fedora 34: newfstatat() is used on top of fstat(). syslog() is an actual system call on some glibc/arch combinations, instead of a connect()/send() implementation. On ppc64 and ppc64le, _llseek(), recv(), send() and getuid() are used. For ppc64 only: ugetrlimit() for the getrlimit() implementation, plus sigreturn() and fcntl64(). On s390x, additionally, we need to allow socketcall() (on top of socket()), and sigreturn() also for passt (not just for pasta). Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* pasta: Check for zero d_reclen returned by getdents64() syscallStefano Brivio2022-01-261-1/+1
| | | | | | | | Seen on PPC with some older kernel versions: we seemingly have bytes left to read from the returned array of dirent structs, but d_reclen is zero: this, and all the subsequent entries, are not valid. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* conf, pasta: Explicitly pass CLONE_{NEWUSER,NEWNET} to setns()Stefano Brivio2022-01-261-2/+4
| | | | | | | Only allow the intended types of namespaces to be joined via setns() as a defensive measure. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Drop <linux/ipv6.h> include, carry own ipv6hdr and opt_hdr definitionsStefano Brivio2022-01-261-2/+0
| | | | | | | This is the only remaining Linux-specific include -- drop it to avoid clang-tidy warnings and to make code more portable. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Add cppcheck target, test, and address resulting warningsStefano Brivio2021-10-211-4/+7
| | | | | | | ...mostly false positives, but a number of very relevant ones too, in tcp_get_sndbuf(), tcp_conn_from_tap(), and siphash PREAMBLE(). Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Fix build with gcc 7, use std=c99, enable some more Clang checkersStefano Brivio2021-10-211-1/+2
| | | | | | | | | | | | | | Unions and structs, you all have names now. Take the chance to enable bugprone-reserved-identifier, cert-dcl37-c, and cert-dcl51-cpp checkers in clang-tidy. Provide a ffsl() weak declaration using gcc built-in. Start reordering includes, but that's not enough for the llvm-include-order checker yet. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Add clang-tidy Makefile target and test, take care of warningsStefano Brivio2021-10-201-1/+1
| | | | | | | Most are just about style and form, but a few were actually serious mistakes (NDP-related). Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Address gcc 11 warningsStefano Brivio2021-10-201-5/+10
| | | | | | | | | A mix of unchecked return values, a missing permission mask for open(2) with O_CREAT, and some false positives from -Wstringop-overflow and -Wmaybe-uninitialized. Reported-by: Martin Hauke <mardnh@gmx.de> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* netlink, conf: Actually get prefix/mask lengthStefano Brivio2021-10-191-2/+7
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* pasta: Allow nanosleep(2) and clock_nanosleep(2) syscalls tooStefano Brivio2021-10-141-1/+1
| | | | | | ...we need those to wait for terminating processes in the namespace. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* netlink, pasta: Configure MTU of tap interface on --config-netStefano Brivio2021-10-141-3/+3
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt, pasta: Completely avoid dynamic memory allocationStefano Brivio2021-10-141-0/+253
Replace libc functions that might dynamically allocate memory with own implementations or wrappers. Drop brk(2) from list of allowed syscalls in seccomp profile. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>