aboutgitcodebugslistschat
Commit message (Collapse)AuthorAgeFilesLines
* passt.1: Add David to AUTHORS2022_10_15.b3f3591Stefano Brivio2022-10-151-2/+2
| | | | | | | I just realised while reading the man page. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* conf: Bind inbound ports with CAP_NET_BIND_SERVICE before isolate_user()Stefano Brivio2022-10-155-63/+98
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Even if CAP_NET_BIND_SERVICE is granted, we'll lose the capability in the target user namespace as we isolate the process, which means we're unable to bind to low ports at that point. Bind inbound ports, and only those, before isolate_user(). Keep the handling of outbound ports (for pasta mode only) after the setup of the namespace, because that's where we'll bind them. To this end, initialise the netlink socket for the init namespace before isolate_user() as well, as we actually need to know the addresses of the upstream interface before binding ports, in case they're not explicitly passed by the user. As we now call nl_sock_init() twice, checking its return code from conf() twice looks a bit heavy: make it exit(), instead, as we can't do much if we don't have netlink sockets. While at it: - move the v4_only && v6_only options check just after the first option processing loop, as this is more strictly related to option parsing proper - update the man page, explaining that CAP_NET_BIND_SERVICE is *not* the preferred way to bind ports, because passt and pasta can be abused to allow other processes to make effective usage of it. Add a note about the recommended sysctl instead - simplify nl_sock_init_do() now that it's called once for each case Reported-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* Rename pasta_setup_ns() to pasta_spawn_cmd()David Gibson2022-10-151-9/+9
| | | | | | | | | | pasta_setup_ns() no longer has much to do with setting up a namespace. Instead it's really about starting the shell or other command we want to run with pasta connectivity. Rename it and its argument structure to be less misleading. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* isolation: Only configure UID/GID mappings in userns when spawning shellDavid Gibson2022-10-154-16/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When in passt mode, or pasta mode spawning a command, we create a userns for ourselves. This is used both to isolate the pasta/passt process itself and to run the spawned command, if any. Since eed17a47 "Handle userns isolation and dropping root at the same time" we've handled both cases the same, configuring the UID and GID mappings in the new userns to map whichever UID we're running as to root within the userns. This mapping is desirable when spawning a shell or other command, so that the user gets a root shell with reasonably clear abilities within the userns and netns. It's not necessarily essential, though. When not spawning a shell, it doesn't really have any purpose: passt itself doesn't need to be root and can operate fine with an unmapped user (using some of the capabilities we get when entering the userns instead). Configuring the uid_map can cause problems if passt is running with any capabilities in the initial namespace, such as CAP_NET_BIND_SERVICE to allow it to forward low ports. In this case the kernel makes files in /proc/pid owned by root rather than the starting user to prevent the user from interfering with the operation of the capability-enhanced process. This includes uid_map meaning we are not able to write to it. Whether this behaviour is correct in the kernel is debatable, but in any case we might as well avoid problems by only initializing the user mappings when we really want them. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* isolation: Prevent any child processes gaining capabilitiesDavid Gibson2022-10-151-0/+56
| | | | | | | | | | | | | | We drop our own capabilities, but it's possible that processes we exec() could gain extra privilege via file capabilities. It shouldn't be possible for us to exec() anyway due to seccomp() and our filesystem isolation. But just in case, zero the bounding and inheritable capability sets to prevent any such child from gainin privilege. Note that we do this *after* spawning the pasta shell/command (if any), because we do want the user to be able to give that privilege if they want. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* isolation: Replace drop_caps() with a version that actually does somethingDavid Gibson2022-10-153-11/+92
| | | | | | | | | | | | | | | | | | | | The current implementation of drop_caps() doesn't really work because it attempts to drop capabilities from the bounding set. That's not the set that really matters, it's about limiting the abilities of things we might later exec() rather than our own capabilities. It also requires CAP_SETPCAP which we won't usually have. Replace it with a new version which uses setcap(2) to drop capabilities from the effective and permitted sets. For now we leave the inheritable set as is, since we don't want to preclude the user from passing inheritable capabilities to the command spawed by pasta. Correctly dropping caps reveals that we were relying on some capabilities we'd supposedly dropped. Re-divide the dropping of capabilities between isolate_initial(), isolate_user() and isolate_prefork() to make this work. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* isolation: Refactor isolate_user() to allow for a common exit pathDavid Gibson2022-10-151-24/+16
| | | | | | | | | | | | | | Currently, isolate_user() exits early if the --netns-only option is given. That works for now, but shortly we're going to want to add some logic to go at the end of isolate_user() that needs to run in all cases: joining a given userns, creating a new userns, or staying in our original userns (--netns-only). To avoid muddying those changes, here we reorganize isolate_user() to have a common exit path for all cases. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* Replace FWRITE with a functionDavid Gibson2022-10-154-22/+45
| | | | | | | | | | | | | | In a few places we use the FWRITE() macro to open a file, replace it's contents with a given string and close it again. There's no real reason this needs to be a macro rather than just a function though. Turn it into a function 'write_file()' and make some ancillary cleanups while we're there: - Add a return code so the caller can handle giving a useful error message - Handle the case of short write()s (unlikely, but possible) - Add O_TRUNC, to make sure we replace the existing contents entirely Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* isolation: Clarify various self-isolation stepsDavid Gibson2022-10-153-13/+86
| | | | | | | | | | We have a number of steps of self-isolation scattered across our code. Improve function names and add comments to make it clearer what the self isolation model is, what the steps do, and why they happen at the points they happen. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* Remove unhelpful drop_caps() call in pasta_start_ns()David Gibson2022-10-151-2/+0
| | | | | | | | | | | | | | | | | | | | drop_caps() has a number of bugs which mean it doesn't do what you'd expect. However, even if we fixed those, the call in pasta_start_ns() doesn't do anything useful: * In the common case, we're UID 0 at this point. In this case drop_caps() doesn't accomplish anything, because even with capabilities dropped, we are still privileged. * When attaching to an existing namespace with --userns or --netns-only we might not be UID 0. In this case it's too early to drop all capabilities: we need at least CAP_NET_ADMIN to configure the tap device in the namespace. Remove this call - we will still drop capabilities a little later in sandbox(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* pasta_start_ns() always ends in parent contextDavid Gibson2022-10-151-4/+1
| | | | | | | | | | | The end of pasta_start_ns() has a test against pasta_child_pid, testing if we're in the parent or the child. However we started the child running the pasta_setup_ns function which always exec()s or exit()s, so if we return from the clone() we are always in the parent, making that test unnecessary. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* pasta: More general way of starting spawned shell as a login shellDavid Gibson2022-10-151-12/+20
| | | | | | | | | | | | | | When invoked so as to spawn a shell, pasta checks explicitly for the shell being bash and if so, adds a "-l" option to make it a login shell. This is not ideal, since this is a bash specific option and requires pasta to know about specific shell variants. There's a general convention for starting a login shell, which is to prepend a "-" to argv[0]. Use this approach instead, so we don't need bash specific logic. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test: Move slower tests to end of test runDavid Gibson2022-10-151-10/+10
| | | | | | | | | | The distro and performance tests are by far the slowest part of the passt testsuite. Move them to the end of the testsuite run, so that it's easier to do a quick test during development by letting the other tests run then interrupting the test runner. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* log.h: Avoid unnecessary GNU extension for token pastingStefano Brivio2022-10-151-2/+2
| | | | | | | | | | | | clang says: ./log.h:23:18: warning: token pasting of ',' and __VA_ARGS__ is a GNU extension [-Wgnu-zero-variadic-macro-arguments] We need token pasting here just because of the 'format' in trace(): drop it. Suggested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* util.h: Add missing gcc pragma push before pragma popStefano Brivio2022-10-151-0/+1
| | | | | | | | | While building with clang: ./util.h:176:24: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas] Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* icmp: Set sin6_scope_id for outbound ICMPv6 echo requestsStefano Brivio2022-10-151-0/+1
| | | | | | | | | | | If we ping a link-local address, we need to pass this to sendto(), as it will obviously fail with -EINVAL otherwise. If we ping other addresses, it's probably a good idea anyway to specify the configured outbound interface here. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* conf: Drop excess colons in usage for DHCP and DNS optionsStefano Brivio2022-10-151-4/+4
| | | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* netlink: Disable duplicate address detection for configured IPv6 addressStefano Brivio2022-10-151-0/+3
| | | | | | | | | | | | | | | | | | With default options, when we pass --config-net, the IPv6 address is actually going to be recycled from the init namespace, so it is in fact duplicated, but duplicate address detection has no way to find out. With a different configured address, that's not the case, but anyway duplicate address detection will be unable to see this. In both cases, we're wasting time for nothing. Pass the IFA_F_NODAD flag as we configure globally scoped IPv6 addresses via netlink. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* Don't create 'tap' socket for ports that are bound to loopback onlyStefano Brivio2022-10-152-69/+117
| | | | | | | | | | | | | | | | | | | | | If the user specifies an explicit loopback address for a port binding, we're going to use that address for the 'tap' socket, and the same exact address for the 'spliced' socket (because those are, by definition, only bound to loopback addresses). This means that the second binding will fail, and, unexpectedly, the port is forwarded, but via tap device, which means the source address in the namespace won't be a loopback address. Make it explicit under which conditions we're creating which kind of socket, by refactoring tcp_sock_init() into two separate functions for IPv4 and IPv6 and gathering those conditions at the beginning. Also, don't create spliced sockets if the user specifies explicitly a non-loopback address, those are harmless but not desired either. Fixes: 3c6ae625101a ("conf, tcp, udp: Allow address specification for forwarded ports") Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp, tcp_splice: Fix port remapping for inbound, spliced connectionsStefano Brivio2022-10-153-11/+18
| | | | | | | | | | | | | | | | | | | | | | | In pasta mode, when we receive a new inbound connection, we need to select a socket that was created in the namespace to proceed and connect() it to its final destination. The existing condition might pick a wrong socket, though, if the destination port is remapped, because we'll check the bitmap of inbound ports using the remapped port (stored in the epoll reference) as index, and not the original port. Instead of using the port bitmap for this purpose, store this information in the epoll reference itself, by adding a new 'outbound' bit, that's set if the listening socket was created the namespace, and unset otherwise. Then, use this bit to pick a socket on the right side. Suggested-by: David Gibson <david@gibson.dropbear.id.au> Fixes: 33482d5bf293 ("passt: Add PASTA mode, major rework") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* tcp, tcp_splice: Adjust comments to current meaning of inbound and outboundStefano Brivio2022-10-152-2/+2
| | | | | | | | | | | | | | | | | | | | For tcp_sock_init_ns(), "inbound" connections used to be the ones being established toward any listening socket we create, as opposed to sockets we connect(). Similarly, tcp_splice_new() used to handle "inbound" connections in the sense that they originated from listening sockets, and they would in turn cause a connect() on an "outbound" socket. Since commit 1128fa03fe73 ("Improve types and names for port forwarding configuration"), though, inbound connections are more broadly defined as the ones directed to guest or namepsace, and outbound the ones originating from there. Update comments for those two functions. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* udp: Fix port and address checks for DNS forwarderStefano Brivio2022-10-151-3/+3
| | | | | | | | | | | | | | | | | | | First off, as we swap endianness for source ports in udp_fill_data_v{4,6}(), we want host endianness, not network endianness. It doesn't actually matter if we use htons() or ntohs() here, but the current version is confusing. In the IPv4 path, when we remap DNS answers, we already swapped the endianness as needed for the source port: don't swap it again, otherwise we'll not map DNS answers for IPv4. In the IPv6 path, when we remap DNS answers, we want to check that they came from our upstream DNS server, not the one configured via --dns-forward (which doesn't even need to exist for this functionality to work). Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* tap: Don't check sequence counts when adding packets to poolStefano Brivio2022-10-151-6/+6
| | | | | | | | | | | This is a minor optimisation possibility I spotted while trying to debug a hang in tap4_handler(): if we run out of space for packet sequences, it's fine to add packets to an existing per-sequence pool. We should check the count of packet sequences only once we realise that we actually need a new packet sequence. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* packet: Fix off-by-one in packet_get_do() sanity checksStefano Brivio2022-10-151-1/+1
| | | | | | | | | | | | | An n-sized pool, or a pool with n entries, doesn't include index n, only up to n - 1. I'm not entirely sure this sanity check actually covers any practical case, but I spotted this while debugging a hang in tap4_handler() (possibly due to malformed sequence entries from qemu). Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* conf: Report usage for --no-netns-quitStefano Brivio2022-10-151-0/+2
| | | | | | Fixes: 745a9ba4284c ("pasta: By default, quit if filesystem-bound net namespace goes away") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* conf, tcp, udp: Allow specification of interface to bind toStefano Brivio2022-10-159-47/+88
| | | | | | | | | | | | | | | | Since kernel version 5.7, commit c427bfec18f2 ("net: core: enable SO_BINDTODEVICE for non-root users"), we can bind sockets to interfaces, if they haven't been bound yet (as in bind()). Introduce an optional interface specification for forwarded ports, prefixed by %, that can be passed together with an address. Reported use case: running local services that use ports we want to have externally forwarded: https://github.com/containers/podman/issues/14425 Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* conf, tap: Add option to quit once the client closes the connectionStefano Brivio2022-10-154-1/+27
| | | | | | | | This is practical to avoid explicit lifecycle management in users, e.g. libvirtd, and is trivial to implement. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* util: Check return value of lseek() while reading bound ports from procfsStefano Brivio2022-10-151-3/+7
| | | | | | | | | Coverity now noticed we're checking most lseek() return values, but not this. Not really relevant, but it doesn't hurt to check we can actually seek before reading lines. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* conf, log, Makefile: Add versioning informationStefano Brivio2022-10-156-3/+26
| | | | | | | Add a --version option displaying that, and also include this information in the log files. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* log: Add missing function comment for trace_init()Stefano Brivio2022-10-141-0/+4
| | | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* log, conf: Add support for logging to fileStefano Brivio2022-10-147-29/+316
| | | | | | | | | | | | | | | | | | | | | | In some environments, such as KubeVirt pods, we might not have a system logger available. We could choose to run in foreground, but this takes away the convenient synchronisation mechanism derived from forking to background when interfaces are ready. Add optional logging to file with -l/--log-file and --log-size. Unfortunately, this means we need to duplicate features that are more appropriately implemented by a system logger, such as rotation. Keep that reasonably simple, by using fallocate() with range collapsing where supported (Linux kernel >= 3.15, extent-based ext4 and XFS) and falling back to an unsophisticated block-by-block moving of entries toward the beginning of the file once we reach the (mandatory) size limit. While at it, clarify the role of LOG_EMERG in passt.c. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* passt.h: Include netinet/if_ether.h before struct ctx declarationStefano Brivio2022-10-141-0/+2
| | | | | | | | This saves some hassle when including passt.h, as we need ETH_ALEN there. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* conf: Drop duplicate, diverging optstring assignmentsStefano Brivio2022-10-141-14/+6
| | | | | | | | | | | | | This originated as a result of copy and paste to introduce a second stage for processing options related to port forwarding, has already bitten David in the past, and just gave me hours of fun. As a matter of fact, the second set of optstring assignments was already incorrect, but it didn't matter because the first one was more restrictive, not allowing optional arguments for -P, -D, -S. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* Move logging functions to a new file, log.cStefano Brivio2022-10-1420-138/+187
| | | | | | | | Logging to file is going to add some further complexity that we don't want to squeeze into util.c. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* test: Add rudimentary support to run selected tests onlyStefano Brivio2022-10-143-2/+53
To keep this simple, only support tests that have corresponding setup and teardown functions implied by their path. For example: ./run passt/ndp will trigger the 'passt' setup and teardown functions. This is not really elegant, but it looks robust, and while David is considering proper alternatives, it should be quite useful. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>