passt - Plug A Simple Socket Transport

	Commit message (Collapse)	Author	Age	Files	Lines
*	pcap: Add an offset argument in pcap_iov()	Laurent Vivier	2024-10-04	1	-2/+3
\| \| \| \| \| \| \| \| \| \|	The offset is passed directly to pcap_frame() and allows any headers that are not part of the frame to capture to be skipped. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	util: Add helper to write() all of a buffer	David Gibson	2024-09-18	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	write(2) might not write all the data it is given. Add a write_all_buf() helper to keep calling it until all the given data is written, or we get an error. Currently we use write_remainder() to do this operation in pcap_frame(). That's a little awkward since it requires constructing an iovec, and future changes we want to make to write_remainder() will be easier in terms of this single buffer helper. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	treewide: Replace strerror() calls	Stefano Brivio	2024-06-21	1	-5/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that we have logging functions embedding perror() functionality, we can make _some_ calls more terse by using them. In many places, the strerror() calls are still more convenient because, for example, they are used in flow debugging functions, or because the return code variable of interest is not 'errno'. While at it, convert a few error messages from a scant perror style to proper failure descriptions. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	treewide: Standardise variable names for various packet lengths	David Gibson	2024-05-02	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	At various points we need to track the lengths of a packet including or excluding various different sets of headers. We don't always use the same variable names for doing so. Worse in some places we use the same name for different things: e.g. tcp_fill_headers[46]() use ip_len for the length including the IP headers, but then tcp_send_flag() which calls it uses it to mean the IP payload length only. To improve clarity, standardise on these names: dlen: L4 protocol payload length ("data length") l4len: plen + length of L4 protocol header l3len: l4len + length of IPv4/IPv6 header l2len: l3len + length of L2 (ethernet) header Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	pcap: Use clock_gettime() instead of gettimeofday()	Stefano Brivio	2024-03-14	1	-13/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	POSIX.1-2008 declared gettimeofday() as obsolete, but I'm a dinosaur. Usually, C libraries translate that to the clock_gettime() system call anyway, but this doesn't happen in Jon's environment, and, there, seccomp happily kills pasta(1) when started with --pcap, because we didn't add gettimeofday() to our seccomp profiles. Use clock_gettime() instead. Reported-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	pcap: add pcap_iov()	Laurent Vivier	2024-03-06	1	-4/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce a new function pcap_iov() to capture packet desribed by an IO vector. Update pcap_frame() to manage iovcnt > 1. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-ID: <20240303135114.1023026-2-lvivier@redhat.com> [dwg: Fixed trivial cppcheck regressions] Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	pcap: Handle short writes in pcap_frame()	David Gibson	2024-02-29	1	-19/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently pcap_frame() assumes that if write() doesn't return an error, it has written everything we want. That's not necessarily true, because it could return a short write. That's not likely to happen on a regular file, but there's not a lot of reason not to be robust here; it's conceivable we might want to direct the pcap fd at a named pipe or similar. So, make pcap_frame() handle short frames by using the write_remainder() helper. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Formatting fix, and avoid gcc warning in pcap_frame()] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	pcap: Update pcap_frame() to take an iovec and offset	David Gibson	2024-02-29	1	-17/+12
\| \| \| \| \| \| \| \| \| \| \| \| \|	Update the low-level helper pcap_frame() to take a struct iovec and offset within it, rather than an explicit pointer and length for the frame. This moves the handling of an offset (to skip vnet_len) from pcap_multiple() to pcap_frame(). This doesn't accomplish a great deal immediately, but will make subsequent changes easier. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	treewide: Use 'z' length modifier for size_t/ssize_t conversions	Stefano Brivio	2023-12-02	1	-2/+2
\| \| \| \| \| \| \| \|	Types size_t and ssize_t are not necessarily long, it depends on the architecture. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	passt: Relicense to GPL 2.0, or any later version	Stefano Brivio	2023-04-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In practical terms, passt doesn't benefit from the additional protection offered by the AGPL over the GPL, because it's not suitable to be executed over a computer network. Further, restricting the distribution under the version 3 of the GPL wouldn't provide any practical advantage either, as long as the passt codebase is concerned, and might cause unnecessary compatibility dilemmas. Change licensing terms to the GNU General Public License Version 2, or any later version, with written permission from all current and past contributors, namely: myself, David Gibson, Laine Stump, Andrea Bolognani, Paul Holzinger, Richard W.M. Jones, Chris Kuhn, Florian Weimer, Giuseppe Scrivano, Stefan Hajnoczi, and Vasiliy Ulyanov. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	udp: Use tap_send_frames()	David Gibson	2023-01-23	1	-30/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To send frames on the tap interface, the UDP uses a fairly complicated two level batching. First multiple frames are gathered into a single "message" for the qemu stream socket, then multiple messages are send with sendmmsg(). We now have tap_send_frames() which already deals with sending a number of frames, including batching and handling partial sends. Use that to considerably simplify things. This does make a couple of behavioural changes: * We used to split messages to keep them under 32kiB (except when a single frame was longer than that). The comments claim this is needed to stop qemu from closing the connection, but we don't have any equivalent logic for TCP. I wasn't able to reproduce the problem with this series, although it was apparently easy to reproduce earlier. My suspicion is that there was never an inherent need to keep messages small, however with larger messages (and default kernel buffer sizes) the chances of needing more than one resend for partial send()s is greatly increased. We used not to correctly handle that case of multiple resends, but now we do. * Previously when we got a partial send on UDP, we would resend the remainder of the entire "message", including multiple frames. The common code now only resends the remainder of a single frame, simply dropping any frames which weren't even partially sent. This is what TCP always did and is probably a better idea for UDP too. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	pcap: Replace pcapm() with pcap_multiple()	David Gibson	2023-01-23	1	-9/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	pcapm() captures multiple frames from a msghdr, however the only thing it cares about in the msghdr is the list of buffers, where it assumes there is one frame to capture per buffer. That's what we want for its single caller but it's not the only obvious choice here (one frame per msghdr would arguably make more sense in isolation). In addition pcapm() has logic that only makes sense in the context of the passt specific path its called from: it skips the first 4 bytes of each buffer, because those have the qemu vnet_len rather than the frame proper. Make this clearer by replacing pcapm() with pcap_multiple() which more explicitly takes one struct iovec per frame, and parameterizes how much of each buffer to skip (i.e. the offset of the frame within the buffer). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	pcap: Introduce pcap_frame() helper	David Gibson	2023-01-23	1	-38/+38
\| \| \| \| \| \| \| \| \| \| \| \|	pcap(), pcapm() and pcapmm() duplicate some code, for the actual writing to the capture file. The purpose of pcapm() and pcapmm() not calling pcap() seems to be to avoid repeatedly calling gettimeofday() and to avoid printing errors for every packet in a batch if there's a problem. We can accomplish that while still sharing code by adding a new helper which takes the packet timestamp as a parameter. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	Move logging functions to a new file, log.c	Stefano Brivio	2022-10-14	1	-0/+1
\| \| \| \| \| \| \| \|	Logging to file is going to add some further complexity that we don't want to squeeze into util.c. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	conf: Make the argument to --pcap option mandatory	David Gibson	2022-08-30	1	-28/+0
\| \| \| \| \| \| \| \| \| \| \|	The --pcap or -p option can be used with or without an argument. If given, the argument gives the name of the file to save a packet trace to. If omitted, we generate a default name in /tmp. Generating the default name isn't particularly useful though, since making a suitable name can easily be done by the caller. Remove this feature. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	treewide: Invalid type in argument to printf format specifier, CWE-686	Stefano Brivio	2022-04-05	1	-3/+3
\| \| \| \| \| \|	Harmless except for two bad debugging prints. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	treewide: Fix android-cloexec-* clang-tidy warnings, re-enable checks	Stefano Brivio	2022-03-29	1	-2/+3
\| \| \| \|	Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	treewide: Mark constant references as const	Stefano Brivio	2022-03-29	1	-5/+5
\| \| \| \|	Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	pcap: Fix mistake in printed string	Stefano Brivio	2022-03-25	1	-1/+1
\| \| \| \| \| \|	Packets are saved to a file, not at it. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	passt, pasta: Namespace-based sandboxing, defer seccomp policy application	Stefano Brivio	2022-02-21	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To reach (at least) a conceptually equivalent security level as implemented by --enable-sandbox in slirp4netns, we need to create a new mount namespace and pivot_root() into a new (empty) mountpoint, so that passt and pasta can't access any filesystem resource after initialisation. While at it, also detach IPC, PID (only for passt, to prevent vulnerabilities based on the knowledge of a target PID), and UTS namespaces. With this approach, if we apply the seccomp filters right after the configuration step, the number of allowed syscalls grows further. To prevent this, defer the application of seccomp policies after the initialisation phase, before the main loop, that's where we expect bad things to happen, potentially. This way, we get back to 22 allowed syscalls for passt and 34 for pasta, on x86_64. While at it, move #syscalls notes to specific code paths wherever it conceptually makes sense. We have to open all the file handles we'll ever need before sandboxing: - the packet capture file can only be opened once, drop instance numbers from the default path and use the (pre-sandbox) PID instead - /proc/net/tcp{,v6} and /proc/net/udp{,v6}, for automatic detection of bound ports in pasta mode, are now opened only once, before sandboxing, and their handles are stored in the execution context - the UNIX domain socket for passt is also bound only once, before sandboxing: to reject clients after the first one, instead of closing the listening socket, keep it open, accept and immediately discard new connection if we already have a valid one Clarify the (unchanged) behaviour for --netns-only in the man page. To actually make passt and pasta processes run in a separate PID namespace, we need to unshare(CLONE_NEWPID) before forking to background (if configured to do so). Introduce a small daemon() implementation, __daemon(), that additionally saves the PID file before forking. While running in foreground, the process itself can't move to a new PID namespace (a process can't change the notion of its own PID): mention that in the man page. For some reason, fork() in a detached PID namespace causes SIGTERM and SIGQUIT to be ignored, even if the handler is still reported as SIG_DFL: add a signal handler that just exits. We can now drop most of the pasta_child_handler() implementation, that took care of terminating all processes running in the same namespace, if pasta started a shell: the shell itself is now the init process in that namespace, and all children will terminate once the init process exits. Issuing 'echo $$' in a detached PID namespace won't return the actual namespace PID as seen from the init namespace: adapt demo and test setup scripts to reflect that. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	passt: Drop <linux/ipv6.h> include, carry own ipv6hdr and opt_hdr definitions	Stefano Brivio	2022-01-26	1	-2/+0
\| \| \| \| \| \| \|	This is the only remaining Linux-specific include -- drop it to avoid clang-tidy warnings and to make code more portable. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	pcap: Fix failure check on write() in pcapm()	Stefano Brivio	2021-10-21	1	-1/+2
\| \| \| \|	Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	passt: Add cppcheck target, test, and address resulting warnings	Stefano Brivio	2021-10-21	1	-2/+4
\| \| \| \| \| \| \|	...mostly false positives, but a number of very relevant ones too, in tcp_get_sndbuf(), tcp_conn_from_tap(), and siphash PREAMBLE(). Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	passt: Fix build with gcc 7, use std=c99, enable some more Clang checkers	Stefano Brivio	2021-10-21	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unions and structs, you all have names now. Take the chance to enable bugprone-reserved-identifier, cert-dcl37-c, and cert-dcl51-cpp checkers in clang-tidy. Provide a ffsl() weak declaration using gcc built-in. Start reordering includes, but that's not enough for the llvm-include-order checker yet. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	passt: Address gcc 11 warnings	Stefano Brivio	2021-10-20	1	-10/+25
\| \| \| \| \| \| \| \| \|	A mix of unchecked return values, a missing permission mask for open(2) with O_CREAT, and some false positives from -Wstringop-overflow and -Wmaybe-uninitialized. Reported-by: Martin Hauke <mardnh@gmx.de> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	pasta: Allow specifying paths and names of namespaces	Giuseppe Scrivano	2021-10-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Based on a patch from Giuseppe Scrivano, this adds the ability to: - specify paths and names of target namespaces to join, instead of a PID, also for user namespaces, with --userns - request to join or create a network namespace only, without entering or creating a user namespace, with --netns-only - specify the base directory for netns mountpoints, with --nsrun-dir Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> [sbrivio: reworked logic to actually join the given namespaces when they're not created, implemented --netns-only and --nsrun-dir, updated pasta demo script and man page] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	pcap: Drop O_DSYNC from pcap file descriptor	Stefano Brivio	2021-09-27	1	-1/+1
\| \| \| \| \| \| \|	passt is stable enough, and dropping O_DSYNC makes reduces the impact of capturing packets on timing, while running tests. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	pcap: Don't make pcap files world-readable	Stefano Brivio	2021-09-27	1	-1/+1
\| \| \| \| \| \| \|	Even if it's just a debugging feature, it's not nice to leak packets to everybody around. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	pcap: Don't reinitialise packet capture if we already have one	Stefano Brivio	2021-09-27	1	-1/+1
\| \| \| \| \| \| \| \|	If the guest disconnects, and a given name (without timestamp) for the pcap file is passed, we would otherwise lose the packets captured until that point. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	passt, pasta: Introduce command-line options and port re-mapping	Stefano Brivio	2021-09-01	1	-39/+21
\| \| \| \|	Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	udp: Introduce recvmmsg()/sendmmsg(), zero-copy path from socket	Stefano Brivio	2021-07-21	1	-2/+80
\| \| \| \| \| \| \| \| \| \|	Packets are received directly onto pre-cooked, static buffers for IPv4 (with partial checksum pre-calculation) and IPv6 frames, with pre-filled Ethernet addresses and, partially, IP headers, and sent out from the same buffers with sendmmsg(), for both passt and pasta (non-local traffic only) modes. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	passt: Add PASTA mode, major rework	Stefano Brivio	2021-07-17	1	-7/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PASTA (Pack A Subtle Tap Abstraction) provides quasi-native host connectivity to an otherwise disconnected, unprivileged network and user namespace, similarly to slirp4netns. Given that the implementation is largely overlapping with PASST, no separate binary is built: 'pasta' (and 'passt4netns' for clarity) both link to 'passt', and the mode of operation is selected depending on how the binary is invoked. Usage example: $ unshare -rUn # echo $$ 1871759 $ ./pasta 1871759 # From another terminal # udhcpc -i pasta0 2>/dev/null # ping -c1 pasta.pizza PING pasta.pizza (64.190.62.111) 56(84) bytes of data. 64 bytes from 64.190.62.111 (64.190.62.111): icmp_seq=1 ttl=255 time=34.6 ms --- pasta.pizza ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 34.575/34.575/34.575/0.000 ms # ping -c1 spaghetti.pizza PING spaghetti.pizza(2606:4700:3034::6815:147a (2606:4700:3034::6815:147a)) 56 data bytes 64 bytes from 2606:4700:3034::6815:147a (2606:4700:3034::6815:147a): icmp_seq=1 ttl=255 time=29.0 ms --- spaghetti.pizza ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 28.967/28.967/28.967/0.000 ms This entails a major rework, especially with regard to the storage of tracked connections and to the semantics of epoll(7) references. Indexing TCP and UDP bindings merely by socket proved to be inflexible and unsuitable to handle different connection flows: pasta also provides Layer-2 to Layer-2 socket mapping between init and a separate namespace for local connections, using a pair of splice() system calls for TCP, and a recvmmsg()/sendmmsg() pair for UDP local bindings. For instance, building on the previous example: # ip link set dev lo up # iperf3 -s $ iperf3 -c ::1 -Z -w 32M -l 1024k -P2 \| tail -n4 [SUM] 0.00-10.00 sec 52.3 GBytes 44.9 Gbits/sec 283 sender [SUM] 0.00-10.43 sec 52.3 GBytes 43.1 Gbits/sec receiver iperf Done. epoll(7) references now include a generic part in order to demultiplex data to the relevant protocol handler, using 24 bits for the socket number, and an opaque portion reserved for usage by the single protocol handlers, in order to track sockets back to corresponding connections and bindings. A number of fixes pertaining to TCP state machine and congestion window handling are also included here. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	passt: Add support for multiple instances in different network namespaces	Stefano Brivio	2021-05-21	1	-2/+12
\| \| \| \| \| \| \| \| \| \| \| \| \|	...sharing the same filesystem. Instead of a fixed path for the UNIX domain socket, passt now uses a path with a counter, probing for existing instances, and picking the first free one. The demo script is updated accordingly -- it can now be started several times to create multiple namespaces with an instance of passt each, with addressing reflecting separate subnets, and NDP proxying between them. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	passt: Introduce packet capture implementation	Stefano Brivio	2021-05-21	1	-0/+108
	With -DDEBUG, passt now saves guest-side traffic captures in pcap format at /tmp/passt_<ISO8601 timestamp>.pcap. The timestamp refers to time and date of start-up. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>