aboutgitcodebugslistschat
path: root/pasta.c
Commit message (Collapse)AuthorAgeFilesLines
* util, pasta: Add do_clone() wrapper around __clone2() and clone()Stefano Brivio2022-11-161-5/+5
| | | | | | | | | | | | | | | Spotted in Debian's buildd logs: on ia64, clone(2) is not available: the glibc wrapper is named __clone2() and it takes, additionally, the size of the stack area passed by the caller. Add a do_clone() wrapper handling the different cases, and also taking care of pointing the child's stack in the middle of the allocated area: on PA-RISC (hppa), handled by clone(), the stack grows up, and on ia64 the stack grows down, but the register backing store grows up -- and I think it might be actually used here. Suggested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* Minor improvements to IPv4 netmask handlingDavid Gibson2022-11-041-5/+2
| | | | | | | | | | | | | | | | There are several minor problems with our parsing of IPv4 netmasks (-n). First, we don't reject nonsensical netmasks like 0.255.0.255. Address this structurally by using prefix length instead of netmask as the primary variable, only converting (and validating) when we need to. This has the added benefit of making some things more uniform with the IPv6 path. Second, when the user specifies a prefix length, we truncate the output from strtol() to an integer, which means we would treat -n 4294967320 as valid (equivalent to 24). Fix types to check for this. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* Rename pasta_setup_ns() to pasta_spawn_cmd()David Gibson2022-10-151-9/+9
| | | | | | | | | | pasta_setup_ns() no longer has much to do with setting up a namespace. Instead it's really about starting the shell or other command we want to run with pasta connectivity. Rename it and its argument structure to be less misleading. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* isolation: Only configure UID/GID mappings in userns when spawning shellDavid Gibson2022-10-151-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When in passt mode, or pasta mode spawning a command, we create a userns for ourselves. This is used both to isolate the pasta/passt process itself and to run the spawned command, if any. Since eed17a47 "Handle userns isolation and dropping root at the same time" we've handled both cases the same, configuring the UID and GID mappings in the new userns to map whichever UID we're running as to root within the userns. This mapping is desirable when spawning a shell or other command, so that the user gets a root shell with reasonably clear abilities within the userns and netns. It's not necessarily essential, though. When not spawning a shell, it doesn't really have any purpose: passt itself doesn't need to be root and can operate fine with an unmapped user (using some of the capabilities we get when entering the userns instead). Configuring the uid_map can cause problems if passt is running with any capabilities in the initial namespace, such as CAP_NET_BIND_SERVICE to allow it to forward low ports. In this case the kernel makes files in /proc/pid owned by root rather than the starting user to prevent the user from interfering with the operation of the capability-enhanced process. This includes uid_map meaning we are not able to write to it. Whether this behaviour is correct in the kernel is debatable, but in any case we might as well avoid problems by only initializing the user mappings when we really want them. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* Replace FWRITE with a functionDavid Gibson2022-10-151-2/+2
| | | | | | | | | | | | | | In a few places we use the FWRITE() macro to open a file, replace it's contents with a given string and close it again. There's no real reason this needs to be a macro rather than just a function though. Turn it into a function 'write_file()' and make some ancillary cleanups while we're there: - Add a return code so the caller can handle giving a useful error message - Handle the case of short write()s (unlikely, but possible) - Add O_TRUNC, to make sure we replace the existing contents entirely Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* Remove unhelpful drop_caps() call in pasta_start_ns()David Gibson2022-10-151-2/+0
| | | | | | | | | | | | | | | | | | | | drop_caps() has a number of bugs which mean it doesn't do what you'd expect. However, even if we fixed those, the call in pasta_start_ns() doesn't do anything useful: * In the common case, we're UID 0 at this point. In this case drop_caps() doesn't accomplish anything, because even with capabilities dropped, we are still privileged. * When attaching to an existing namespace with --userns or --netns-only we might not be UID 0. In this case it's too early to drop all capabilities: we need at least CAP_NET_ADMIN to configure the tap device in the namespace. Remove this call - we will still drop capabilities a little later in sandbox(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* pasta_start_ns() always ends in parent contextDavid Gibson2022-10-151-4/+1
| | | | | | | | | | | The end of pasta_start_ns() has a test against pasta_child_pid, testing if we're in the parent or the child. However we started the child running the pasta_setup_ns function which always exec()s or exit()s, so if we return from the clone() we are always in the parent, making that test unnecessary. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* pasta: More general way of starting spawned shell as a login shellDavid Gibson2022-10-151-12/+20
| | | | | | | | | | | | | | When invoked so as to spawn a shell, pasta checks explicitly for the shell being bash and if so, adds a "-l" option to make it a login shell. This is not ideal, since this is a bash specific option and requires pasta to know about specific shell variants. There's a general convention for starting a login shell, which is to prepend a "-" to argv[0]. Use this approach instead, so we don't need bash specific logic. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* Move logging functions to a new file, log.cStefano Brivio2022-10-141-0/+1
| | | | | | | | Logging to file is going to add some further complexity that we don't want to squeeze into util.c. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* clang-tidy: Fix spurious null pointer warning in pasta_start_ns()David Gibson2022-09-291-1/+4
| | | | | | | | | | | | clang-tidy isn't quite clever enough to figure out that getenv("SHELL") will return the same thing both times here, which makes it conclude that shell could be NULL, causing problems later. It's a bit ugly that we call getenv() twice in any case, so rework this in a way that clang-tidy can figure out shell won't be NULL. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* Handle userns isolation and dropping root at the same timeDavid Gibson2022-09-131-49/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | passt/pasta can interact with user namespaces in a number of ways: 1) With --netns-only we'll remain in our original user namespace 2) With --userns or a PID option to pasta we'll join either the given user namespace or that of the PID 3) When pasta spawns a shell or command we'll start a new user namespace for the command and then join it 4) With passt we'll create a new user namespace when we sandbox() ourself However (3) and (4) turn out to have essentially the same effect. In both cases we create one new user namespace. The spawned command starts there, and passt/pasta itself will live there from sandbox() onwards. Because of this, we can simplify user namespace handling by moving the userns handling earlier, to the same point we drop root in the original namespace. Extend the drop_user() function to isolate_user() which does both. After switching UID and GID in the original userns, isolate_user() will either join or create the userns we require. When we spawn a command with pasta_start_ns()/pasta_setup_ns() we no longer need to create a userns, because we're already made one. sandbox() likewise no longer needs to create (or join) an userns because we're already in the one we need. We no longer need c->pasta_userns_fd, since the fd is only used locally in isolate_user(). Likewise we can replace c->netns_only with a local in conf(), since it's not used outside there. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* Correctly handle --netns-only in pasta_start_ns()David Gibson2022-09-131-2/+2
| | | | | | | | | --netns-only is supposed to make pasta use only a network namespace, not a user namespace. However, pasta_start_ns() has this backwards, and if --netns-only is specified it creates a user namespace but *not* a network namespace. Correct this. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* Clean up and rename conf_ns_open()David Gibson2022-09-131-0/+66
| | | | | | | | | | | | | | conf_ns_open() opens file descriptors for the namespaces pasta needs, but it doesnt really have anything to do with configuration any more. For better clarity, move it to pasta.c and rename it pasta_open_ns(). This makes the symmetry between it and pasta_start_ns() more clear, since these represent the two basic ways that pasta can operate, either attaching to an existing namespace/process or spawning a new one. Since its no longer validating options, the errors it could return shouldn't cause a usage message. Just exit directly with an error instead. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* Move self-isolation code into a separate fileDavid Gibson2022-09-131-0/+1
| | | | | | | | passt/pasta contains a number of routines designed to isolate passt from the rest of the system for security. These are spread through util.c and passt.c. Move them together into a new isolation.c file. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* Allow pasta to take a command to executeDavid Gibson2022-08-301-10/+23
| | | | | | | | | | | | | | When not given an existing PID or network namspace to attach to, pasta spawns a shell. Most commands which can spawn a shell in an altered environment can also run other commands in that same environment, which can be useful in automation. Allow pasta to do the same thing; it can be given an arbitrary command to run in the network and user namespace which pasta creates. If neither a command nor an existing PID or netns to attach to is given, continue to spawn a default shell, as before. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* Don't unnecessarily avoid CLOEXEC flags2022_08_24.60ffc5bDavid Gibson2022-08-241-1/+1
| | | | | | | | | | | | | | | | There are several places in the passt code where we have lint overrides because we're not adding CLOEXEC flags to open or other operations. Comments suggest this is because it's before we fork() into the background but we'll need those file descriptors after we're in the background. However, as the name suggests CLOEXEC closes on exec(), not on fork(). The only place we exec() is either super early invoke the avx2 version of the binary, or when we start a shell in pasta mode, which certainly *doesn't* require the fds in question. Add the CLOEXEC flag in those places, and remove the lint overrides. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* Make substructures for IPv4 and IPv6 specific context informationDavid Gibson2022-07-301-5/+5
| | | | | | | | | | | | The context structure contains a batch of fields specific to IPv4 and to IPv6 connectivity. Split those out into a sub-structure. This allows the conf_ip4() and conf_ip6() functions, which take the entire context but touch very little of it, to be given more specific parameters, making it clearer what it affects without stepping through the code. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* Separate IPv4 and IPv6 configurationDavid Gibson2022-07-301-2/+2
| | | | | | | | | | | | | | | | | | | After recent changes, conf_ip() now has essentially entirely disjoint paths for IPv4 and IPv6 configuration. So, it's cleaner to split them out into different functions conf_ip4() and conf_ip6(). Splitting these out also lets us make the interface a bit nicer, having them return success or failure directly, rather than manipulating c->v4 and c->v6 to indicate success/failure of the two versions. Since these functions may also initialize the interface index for each protocol, it turns out we can then drop c->v4 and c->v6 entirely, replacing tests on those with tests on whether c->ifi4 or c->ifi6 is non-zero (since a 0 interface index is never valid). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Whitespace fixes] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* treewide: Argument cannot be negative, CWE-687Stefano Brivio2022-04-071-17/+8
| | | | | | Actually harmless. Reported by Coverity. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* treewide: Fix android-cloexec-* clang-tidy warnings, re-enable checksStefano Brivio2022-03-291-7/+9
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* seccomp: Adjust list of allowed syscalls for armv6l, armv7lStefano Brivio2022-02-261-1/+2
| | | | | | | | | | | It looks like glibc commonly implements clock_gettime(2) with clock_gettime64(), and uses recv() instead of recvfrom(), send() instead of sendto(), and sigreturn() instead of rt_sigreturn() on armv6l and armv7l. Adjust the list of system calls for armv6l and armv7l accordingly. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* pasta: By default, quit if filesystem-bound net namespace goes awayStefano Brivio2022-02-211-0/+52
| | | | | | | | | | | | This should be convenient for users managing filesystem-bound network namespaces: monitor the base directory of the namespace and exit if the namespace given as PATH or NAME target is deleted. We can't add an inotify watch directly on the namespace directory, that won't work with nsfs. Add an option to disable this behaviour, --no-netns-quit. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt, pasta: Namespace-based sandboxing, defer seccomp policy applicationStefano Brivio2022-02-211-106/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To reach (at least) a conceptually equivalent security level as implemented by --enable-sandbox in slirp4netns, we need to create a new mount namespace and pivot_root() into a new (empty) mountpoint, so that passt and pasta can't access any filesystem resource after initialisation. While at it, also detach IPC, PID (only for passt, to prevent vulnerabilities based on the knowledge of a target PID), and UTS namespaces. With this approach, if we apply the seccomp filters right after the configuration step, the number of allowed syscalls grows further. To prevent this, defer the application of seccomp policies after the initialisation phase, before the main loop, that's where we expect bad things to happen, potentially. This way, we get back to 22 allowed syscalls for passt and 34 for pasta, on x86_64. While at it, move #syscalls notes to specific code paths wherever it conceptually makes sense. We have to open all the file handles we'll ever need before sandboxing: - the packet capture file can only be opened once, drop instance numbers from the default path and use the (pre-sandbox) PID instead - /proc/net/tcp{,v6} and /proc/net/udp{,v6}, for automatic detection of bound ports in pasta mode, are now opened only once, before sandboxing, and their handles are stored in the execution context - the UNIX domain socket for passt is also bound only once, before sandboxing: to reject clients after the first one, instead of closing the listening socket, keep it open, accept and immediately discard new connection if we already have a valid one Clarify the (unchanged) behaviour for --netns-only in the man page. To actually make passt and pasta processes run in a separate PID namespace, we need to unshare(CLONE_NEWPID) before forking to background (if configured to do so). Introduce a small daemon() implementation, __daemon(), that additionally saves the PID file before forking. While running in foreground, the process itself can't move to a new PID namespace (a process can't change the notion of its own PID): mention that in the man page. For some reason, fork() in a detached PID namespace causes SIGTERM and SIGQUIT to be ignored, even if the handler is still reported as SIG_DFL: add a signal handler that just exits. We can now drop most of the pasta_child_handler() implementation, that took care of terminating all processes running in the same namespace, if pasta started a shell: the shell itself is now the init process in that namespace, and all children will terminate once the init process exits. Issuing 'echo $$' in a detached PID namespace won't return the actual namespace PID as seen from the init namespace: adapt demo and test setup scripts to reflect that. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* seccomp: Add a number of alternate and per-arch syscallsStefano Brivio2022-01-261-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Depending on the C library, but not necessarily in all the functions we use, statx() might be used instead of stat(), getdents() instead of getdents64(), readlinkat() instead of readlink(), openat() instead of open(). On aarch64, it's clone() and not fork(), and dup3() instead of dup2() -- just allow the existing alternative instead of dealing with per-arch selections. Since glibc commit 9a7565403758 ("posix: Consolidate fork implementation"), we need to allow set_robust_list() for fork()/clone(), even in a single-threaded context. On some architectures, epoll_pwait() is provided instead of epoll_wait(), but never both. Same with newfstat() and fstat(), sigreturn() and rt_sigreturn(), getdents64() and getdents(), readlink() and readlinkat(), unlink() and unlinkat(), whereas pipe() might not be available, but pipe2() always is, exclusively or not. Seen on Fedora 34: newfstatat() is used on top of fstat(). syslog() is an actual system call on some glibc/arch combinations, instead of a connect()/send() implementation. On ppc64 and ppc64le, _llseek(), recv(), send() and getuid() are used. For ppc64 only: ugetrlimit() for the getrlimit() implementation, plus sigreturn() and fcntl64(). On s390x, additionally, we need to allow socketcall() (on top of socket()), and sigreturn() also for passt (not just for pasta). Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* pasta: Check for zero d_reclen returned by getdents64() syscallStefano Brivio2022-01-261-1/+1
| | | | | | | | Seen on PPC with some older kernel versions: we seemingly have bytes left to read from the returned array of dirent structs, but d_reclen is zero: this, and all the subsequent entries, are not valid. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* conf, pasta: Explicitly pass CLONE_{NEWUSER,NEWNET} to setns()Stefano Brivio2022-01-261-2/+4
| | | | | | | Only allow the intended types of namespaces to be joined via setns() as a defensive measure. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Drop <linux/ipv6.h> include, carry own ipv6hdr and opt_hdr definitionsStefano Brivio2022-01-261-2/+0
| | | | | | | This is the only remaining Linux-specific include -- drop it to avoid clang-tidy warnings and to make code more portable. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Add cppcheck target, test, and address resulting warningsStefano Brivio2021-10-211-4/+7
| | | | | | | ...mostly false positives, but a number of very relevant ones too, in tcp_get_sndbuf(), tcp_conn_from_tap(), and siphash PREAMBLE(). Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Fix build with gcc 7, use std=c99, enable some more Clang checkersStefano Brivio2021-10-211-1/+2
| | | | | | | | | | | | | | Unions and structs, you all have names now. Take the chance to enable bugprone-reserved-identifier, cert-dcl37-c, and cert-dcl51-cpp checkers in clang-tidy. Provide a ffsl() weak declaration using gcc built-in. Start reordering includes, but that's not enough for the llvm-include-order checker yet. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Add clang-tidy Makefile target and test, take care of warningsStefano Brivio2021-10-201-1/+1
| | | | | | | Most are just about style and form, but a few were actually serious mistakes (NDP-related). Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Address gcc 11 warningsStefano Brivio2021-10-201-5/+10
| | | | | | | | | A mix of unchecked return values, a missing permission mask for open(2) with O_CREAT, and some false positives from -Wstringop-overflow and -Wmaybe-uninitialized. Reported-by: Martin Hauke <mardnh@gmx.de> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* netlink, conf: Actually get prefix/mask lengthStefano Brivio2021-10-191-2/+7
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* pasta: Allow nanosleep(2) and clock_nanosleep(2) syscalls tooStefano Brivio2021-10-141-1/+1
| | | | | | ...we need those to wait for terminating processes in the namespace. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* netlink, pasta: Configure MTU of tap interface on --config-netStefano Brivio2021-10-141-3/+3
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt, pasta: Completely avoid dynamic memory allocationStefano Brivio2021-10-141-0/+253
Replace libc functions that might dynamically allocate memory with own implementations or wrappers. Drop brk(2) from list of allowed syscalls in seccomp profile. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>