passt - Plug A Simple Socket Transport

	Commit message (Collapse)	Author	Age	Files	Lines
*	passt, util: Close any open file that the parent might have leaked	Stefano Brivio	2024-08-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a parent accidentally or due to implementation reasons leaks any open file, we don't want to have access to them, except for the file passed via --fd, if any. This is the case for Podman when Podman's parent leaks files into Podman: it's not practical for Podman to close unrelated files before starting pasta, as reported by Paul. Use close_range(2) to close all open files except for standard streams and the one from --fd. Given that parts of conf() depend on other files to be already opened, such as the epoll file descriptor, we can't easily defer this to a more convenient point, where --fd was already parsed. Introduce a minimal, duplicate version of --fd parsing to keep this simple. As we need to check that the passed --fd option doesn't exceed INT_MAX, because we'll parse it with strtol() but file descriptor indices are signed ints (regardless of the arguments close_range() take), extend the existing check in the actual --fd parsing in conf(), also rejecting file descriptors numbers that match standard streams, while at it. Suggested-by: Paul Holzinger <pholzing@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Paul Holzinger <pholzing@redhat.com>
*	log, passt: Keep printing to stderr when passt is running in foreground2024_08_06.ee36266	Stefano Brivio	2024-08-06	1	-5/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are two cases where we want to stop printing to stderr: if it's closed, and if pasta spawned a shell (and --debug wasn't given). But if passt is running in foreground, we currently stop to report any message, even error messages, once we're ready, as reported by Laurent, because we set the log_runtime flag, which we use to indicate we're ready, regardless of whether we're running in foreground or not. Turn that flag (back) to log_stderr, and set it only when we really want to stop printing to stderr. Reported-by: Laurent Vivier <lvivier@redhat.com> Fixes: afd9cdc9bb48 ("log, passt: Always print to stderr before initialisation is complete") Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	log: Fetch log times with CLOCK_MONOTONIC, not CLOCK_REALTIME	Stefano Brivio	2024-07-26	1	-1/+1
\| \| \| \| \| \| \| \| \|	We report relative timestamps in logs, so we want to avoid jumps in the system time. Suggested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	log: Initialise timestamp for relative log time also if we use a log file	Stefano Brivio	2024-07-26	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \|	...not just for debug messages. Otherwise, timestamps in the log file are consistent but the starting point is not zero. Do this right away as we enter main(), so that the resulting timestamps are as closely as possible relative to when we start. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	udp: Rename UDP listening sockets	David Gibson	2024-07-19	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	EPOLL_TYPE_UDP is now only used for "listening" sockets; long lived sockets which can initiate new flows. Rename to EPOLL_TYPE_UDP_LISTEN and associated functions to match. Along with that, remove the .orig field from union udp_listen_epoll_ref, since it is now always true. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	udp: Handle "spliced" datagrams with per-flow sockets	David Gibson	2024-07-19	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When forwarding a datagram to a socket, we need to find a socket with a suitable local address to send it. Currently we keep track of such sockets in an array indexed by local port, but this can't properly handle cases where we have multiple local addresses in active use. For "spliced" (socket to socket) cases, improve this by instead opening a socket specifically for the target side of the flow. We connect() as well as bind()ing that socket, so that it will only receive the flow's reply packets, not anything else. We direct datagrams sent via that socket using the addresses from the flow table, effectively replacing bespoke addressing logic with the unified logic in fwd.c When we create the flow, we also take a duplicate of the originating socket, and use that to deliver reply datagrams back to the origin, again using addresses from the flow table entry. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	conf, passt: Don't call __openlog() if a log file is used	Stefano Brivio	2024-06-21	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a log file is configured, we would otherwise open a connection to the system logger (if any), print any message that we might have before we initialise the log file, and then keep that connection around for no particular reason. Call __openlog() as an alternative to the log file setup, instead. This way, we might skip printing some messages during the initialisation phase, but they're probably not really valuable to have in a system log, and we're going to print them to standard error anyway. Suggested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	treewide: Replace strerror() calls	Stefano Brivio	2024-06-21	1	-8/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that we have logging functions embedding perror() functionality, we can make _some_ calls more terse by using them. In many places, the strerror() calls are still more convenient because, for example, they are used in flow debugging functions, or because the return code variable of interest is not 'errno'. While at it, convert a few error messages from a scant perror style to proper failure descriptions. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	treewide: Replace perror() calls with calls to logging functions	Stefano Brivio	2024-06-21	1	-25/+16
\| \| \| \| \| \| \| \| \| \| \| \|	perror() prints directly to standard error, but in many cases standard error might be already closed, or we might want to skip logging, based on configuration. Our logging functions provide all that. While at it, make errors more descriptive, replacing some of the existing basic perror-style messages. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	log, passt: Always print to stderr before initialisation is complete	Stefano Brivio	2024-06-21	1	-6/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After commit 15001b39ef1d ("conf: set the log level much earlier"), we had a phase during initialisation when messages wouldn't be printed to standard error anymore. Commit f67238aa864d ("passt, log: Call __openlog() earlier, log to stderr until we detach") fixed that, but only for the case where no log files are given. If a log file is configured, vlogmsg() will not call passt_vsyslog(), but during initialisation, LOG_PERROR is set, so to avoid duplicated prints (which would result from passt_vsyslog() printing to stderr), we don't call fprintf() from vlogmsg() either. This is getting a bit too complicated. Instead of abusing LOG_PERROR, define an internal logging flag that clearly represents that we're not done with the initialisation phase yet. If this flag is not set, make sure we always print to stderr, if the log mask matches. Reported-by: Yalan Zhang <yalzhang@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	conf, passt: Make --stderr do nothing, and deprecate it	Stefano Brivio	2024-06-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The original behaviour of printing messages to standard error by default when running from a non-interactive terminal was introduced because the first KubeVirt integration draft used to start passt in foreground and get messages via standard error. For development purposes, the system logger was more convenient at that point, and passt was running from interactive terminals only if not started by the KubeVirt integration. This behaviour was introduced by 84a62b79a2bc ("passt: Also log to stderr, don't fork to background if not interactive"). Later, I added command-line options in 1e49d194d017 ("passt, pasta: Introduce command-line options and port re-mapping") and accidentally reversed this condition, which wasn't a problem as --stderr could force printing to standard error anyway (and it was used by KubeVirt). Nowadays, the KubeVirt integration uses a log file (requested via libvirt configuration), and the same applies for Podman if one actually needs to look at runtime logs. There are no use cases left, as far as I know, where passt runs in foreground in non-interactive terminals. Seize the chance to reintroduce some sanity here. If we fork to background, standard error is closed, so --stderr is useless in that case. If we run in foreground, there's no harm in printing messages to standard error, and that accidentally became the default behaviour anyway, so --stderr is not needed in that case. It would be needed for non-interactive terminals, but there are no use cases, and if there were, let's log to standard error anyway: the user can always redirect standard error to /dev/null if needed. Before we're up and running, we need to print to standard error anyway if something happens, otherwise we can't report failure to start in any kind of usage, stand-alone or in integrations. So, make --stderr do nothing, and deprecate it. While at it, drop a left-over comment about --foreground being the default only for interactive terminals, because it's not the case anymore. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	conf, passt: Don't try to log to stderr after we close it	Stefano Brivio	2024-06-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	If we don't run in foreground, we close standard error as we daemonise, so it makes no sense to check if the controlling terminal is an interactive terminal or if --force-stderr was given, to decide if we want to log to standard error. Make --force-stderr depend on --foreground. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	vhost-user: compare mode MODE_PASTA and not MODE_PASST	Laurent Vivier	2024-06-13	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	As we are going to introduce the MODE_VU that will act like the mode MODE_PASST, compare to MODE_PASTA rather than to add a comparison to MODE_VU when we check for MODE_PASST. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	udp: rename udp_sock_handler() to udp_buf_sock_handler()	Laurent Vivier	2024-06-13	1	-1/+1
\| \| \| \| \| \| \| \| \|	We are going to introduce a variant of the function to use vhost-user buffers rather than passt internal buffers. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	Remove pointless macro parameters in CALL_PROTO_HANDLER	David Gibson	2024-06-07	1	-3/+3
\| \| \| \| \| \| \| \|	The 'c' parameter is always passed exactly 'c'. The 'now' parameter is always passed exactly 'now'. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	conf, passt, tap: Open socket and PID files before switching UID/GID	Stefano Brivio	2024-05-23	1	-6/+4
\| \| \| \| \| \| \| \| \| \| \| \|	Otherwise, if the user runs us as root, and gives us paths that are only accessible by root, we'll fail to open them, which might in turn encourage users to change permissions or ownerships: definitely a bad idea in terms of security. Reported-by: Minxi Hou <mhou@redhat.com> Reported-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Richard W.M. Jones <rjones@redhat.com>
*	passt, util: Move opening of PID file to its own function	Stefano Brivio	2024-05-23	1	-9/+2
\| \| \| \| \| \| \|	We won't call it from main() any longer: move it. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
*	util: Rename write_pidfile() to pidfile_write()	Stefano Brivio	2024-05-23	1	-1/+1
\| \| \| \| \| \| \| \|	As I'm adding pidfile_open() in the next patch. The function comment didn't match, by the way. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
*	passt, tap: Don't use -1 as uninitialised value for fd_tap_listen	Stefano Brivio	2024-05-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a remnant from the time we kept access to the original filesystem and we could reinitialise the listening AF_UNIX socket. Since commit 0515adceaa8f ("passt, pasta: Namespace-based sandboxing, defer seccomp policy application"), however, we can't re-bind the listening socket once we're up and running. Drop the -1 initalisation and the corresponding check. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	passt.c: explicitly include libgen.h for basename	lemmi	2024-05-13	1	-0/+1
\| \| \| \| \| \| \| \|	fixes implicit declaration warning on musl Signed-off-by: lemmi <lemmi@nerd2nerd.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	treewide: Compilers' name for armv6l and armv7l is "arm"	Stefano Brivio	2024-04-11	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When I switched from 'uname -m' to 'gcc -dumpmachine' to fetch the architecture name for, among others, seccomp.sh, I didn't realise that "armv6l" and "armv7l" are just Linux kernel names -- compilers just call that "arm". Fix the "syscalls" annotation we use to define seccomp profiles accordingly, otherwise pasta will be terminated on sigreturn() on armv6l and armv7l. Fixes: 213c397492bd ("passt, pasta: Run-time selection of AVX2 build") Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	passt, log: Call __openlog() earlier, log to stderr until we detach	Stefano Brivio	2024-03-14	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Paul reports that, with commit 15001b39ef1d ("conf: set the log level much earlier"), early messages aren't reported to standard error anymore. The reason is that, once the log mask is changed from LOG_EARLY, we don't force logging to stderr, and this mechanism was abused to have early errors on stderr. Now that we drop LOG_EARLY earlier on, this doesn't work anymore. Call __openlog() as soon as we know the mode we're running as, using LOG_PERROR. Then, once we detach, if we're not running from an interactive terminal and logging to standard error is not forced, drop LOG_PERROR from the options. While at it, check if the standard error descriptor refers to a terminal, instead of checking standard output: if the user redirects standard output to /dev/null, they might still want to see messages from standard error. Further, make sure we don't print messages to standard error reporting that we couldn't log to the system logger, if we didn't open a connection yet. That's expected. Reported-by: Paul Holzinger <pholzing@redhat.com> Fixes: 15001b39ef1d ("conf: set the log level much earlier") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	icmp: Use 'flowside' epoll references for ping sockets	David Gibson	2024-03-12	1	-7/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently ping sockets use a custom epoll reference type which includes the ICMP id. However, now that we have entries in the flow table for ping flows, finding that is sufficient to get everything else we want, including the id. Therefore remove the icmp_epoll_ref type and use the general 'flowside' field for ping sockets. Having done this we no longer need separate EPOLL_TYPE_ICMP and EPOLL_TYPE_ICMPV6 reference types, because we can easily determine which case we have from the flow type. Merge both types into EPOLL_TYPE_PING. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	icmp: Store ping socket information in flow table	David Gibson	2024-03-12	1	-5/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently icmp_id_map[][] stores information about ping sockets in a bespoke structure. Move the same information into new types of flow in the flow table. To match that change, replace the existing ICMP timer with a flow-based timer for expiring ping sockets. This has the advantage that we only need to scan the active flows, not all possible ids. We convert icmp_id_map[][] to point to the flow table entries, rather than containing its own information. We do still use that array for locating the right ping flows, rather than using a "flow native" form of lookup for the time being. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Update id_sock description in comment to icmp_ping_new()] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	conf: set the log level much earlier	Paul Holzinger	2024-02-27	1	-10/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	--quiet is supposed to silence the "No routable interface" message but it does not work because the log level was set long after conf_ip4/6() was called which means it uses the default level which logs everything. To address this move the log level logic directly after the option parsing in conf(). Signed-off-by: Paul Holzinger <pholzing@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	passt: make --quiet set the log level to warning	Paul Holzinger	2024-02-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Based on the man page and help output --quiet hides informational messages. This means that warnings should still be logged. This was discussed in[1]. [1] https://archives.passt.top/passt-dev/20240216114304.7234a83f@elisabeth/T/#m42652824644973674e84baf9e0bf1d0e88104450 Signed-off-by: Paul Holzinger <pholzing@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	pasta: Add fallback timer mechanism to check if namespace is gone	Stefano Brivio	2024-02-16	1	-15/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We don't know how frequently this happens, but hitting fs.inotify.max_user_watches or similar sysctl limits is definitely not out of question, and Paul mentioned that, for example, Podman's CI environments hit similar issues in the past. Introduce a fallback mechanism based on a timer file descriptor: we grab the directory handle at startup, and we can then use openat(), triggered periodically, to check if the (network) namespace directory still exists. If openat() fails at some point, exit. Link: https://github.com/containers/podman/pull/21563#issuecomment-1943505707 Reported-by: Paul Holzinger <pholzing@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	log: setlogmask(0) can actually result in a system call, don't use it	Stefano Brivio	2024-02-14	1	-5/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before commit 32d07f5e59f2 ("passt, pasta: Completely avoid dynamic memory allocation"), we didn't store the current log mask in a variable, and we fetched it using setlogmask(0) wherever needed. But after that commit, we can use our log_mask copy instead. And we should: with recent glibc versions, setlogmask(0) actually results in a system call, which causes a substantial overhead with high transfer rates: we use setlogmask(0) even to decide we don't want to print debug messages. Now that we rely on log_mask in early stages, before setlogmask() is called, we need to initialise that variable to the special LOG_EMERG mask value right away: define LOG_EARLY to make this clearer, and, while at it, group conditions in vlogmsg() into something more terse. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	icmp: Consolidate icmp_sock_handler() with icmpv6_sock_handler()	David Gibson	2024-01-22	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	Currently we have separate handlers for ICMP and ICMPv6 ping replies. Although there are a number of points of difference, with some creative refactoring we can combine these together sensibly. Although it doesn't save a vast amount of code, it does make it clearer that we're performing basically the same steps for each case. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	flow: Avoid moving flow entries to compact table	David Gibson	2024-01-22	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently we always keep the flow table maximally compact: that is all the active entries are contiguous at the start of the table. Doing this sometimes requires moving an entry when one is freed. That's kind of fiddly, and potentially expensive: it requires updating the hash table for the new location, and depending on flow type, it may require EPOLL_CTL_MOD, system calls to update epoll tags with the new location too. Implement a new way of managing the flow table that doesn't ever move entries. It attempts to maintain some compactness by always using the first free slot for a new connection, and mitigates the effect of non compactness by cheaply skipping over contiguous blocks of free entries. See the "theory of operation" comment in flow.c for details. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>b [sbrivio: additional ASSERT(flow_first_free <= FLOW_MAX - 2) to avoid Coverity Scan false positive] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	tcp, tcp_splice: Avoid double layered dispatch for connected TCP sockets	David Gibson	2024-01-22	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \|	Currently connected TCP sockets have the same epoll type, whether they're for a "tap" connection or a spliced connection. This means that tcp_sock_handler() has to do a secondary check on the type of the connection to call the right function. We can avoid this by adding a new epoll type and dispatching directly to the right thing. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	epoll: Better handling of number of epoll types	David Gibson	2024-01-22	1	-1/+3
\| \| \| \| \| \| \| \| \|	As we already did for flow types, use an "EPOLL_NUM_TYPES" isntead of EPOLL_TYPE_MAX, which is a little bit safer and clearer. Add a static assert on the size of the matching names array. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	flow, tcp: Add handling for per-flow timers	David Gibson	2024-01-22	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	tcp_timer() scans the flow table so that it can run tcp_splice_timer() on each spliced connection. More generally, other flow types might want to run similar timers in future. We could add a flow_timer() analagous to tcp_timer(), udp_timer() etc. However, this would need to scan the flow table, which we would have just done in flow_defer_handler(). We'd prefer to just scan the flow table once, dispatching both per-flow deferred events and per-flow timed events if necessary. So, extend flow_defer_handler() to do this. For now we use the same timer interval for all flow types (1s). We can make that more flexible in future if we need to. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	flow, tcp: Add flow-centric dispatch for deferred flow handling	David Gibson	2024-01-22	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	tcp_defer_handler(), amongst other things, scans the flow table and does some processing for each TCP connection. When we add other protocols to the flow table, they're likely to want some similar scanning. It makes more sense for cache friendliness to perform a single scan of the flow table and dispatch to the protocol specific handlers, rather than having each protocol separately scan the table. To that end, add a new flow_defer_handler() handling all flow-linked deferred operations. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	tcp: "TCP" hash secret doesn't need to be TCP specific	David Gibson	2023-12-04	1	-0/+40
\| \| \| \| \| \| \| \| \| \| \| \| \|	The TCP state structure includes a 128-bit hash_secret which we use for SipHash calculations to mitigate attacks on the TCP hash table and initial sequence number. We have plans to use SipHash in places that aren't TCP related, and there's no particular reason they'd need their own secret. So move the hash_secret to the general context structure. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	tcp, udp: Don't pre-fill IPv4 destination address in headers	David Gibson	2023-08-22	1	-6/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Because packets sent on the tap interface will always be going to the guest/namespace, we more-or-less know what address they'll be going to. So we pre-fill this destination address in our header buffers for IPv4. We can't do the same for IPv6 because we could need either the global or link-local address for the guest. In future we're going to want more flexibility for the destination address, so this pre-filling will get in the way. Change the flow so we always fill in the IPv4 destination address for each packet, rather than prefilling it from proto_update_l2_buf(). In fact for TCP we already redundantly filled the destination for each packet anyway. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	epoll: Use different epoll types for passt and pasta tap fds	David Gibson	2023-08-13	1	-3/+7
\| \| \| \| \| \| \| \| \| \| \|	Currently we have a single epoll event type for the "tap" fd, which could be either a handle on a /dev/net/tun device (pasta) or a connected Unix socket (passt). However for the two modes we call different handler functions. Simplify this a little by using different epoll types and dispatching directly to the correct handler function. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	epoll: Split listening Unix domain socket into its own type	David Gibson	2023-08-13	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	tap_handler() actually handles events on three different types of object: the /dev/tap character device (pasta), a connected Unix domain socket (passt) or a listening Unix domain socket (passt). The last, in particular, really has no handling in common with the others, so split it into its own epoll type and directly dispatch to the relevant handler from the top level. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	epoll: Split handling of listening TCP sockets into their own handler	David Gibson	2023-08-13	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	tcp_sock_handler() handles both listening TCP sockets, and connected TCP sockets, but what it needs to do in those cases has essentially nothing in common. Therefore, give listening sockets their own epoll_type value and dispatch directly to their own handler from the top level. Furthermore, the two handlers need essentially entirely different information from the reference: we re-(ab)used the index field in the tcp_epoll_ref to indicate the port for the listening socket, but that's not the same meaning. So, switch listening sockets to their own reference type which we can lay out as we please. That lets us remove the listen and outbound fields from the normal (connected) tcp_epoll_ref, reducing it to just the connection table index. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	epoll: Split handling of TCP timerfds into its own handler function	David Gibson	2023-08-13	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \|	tcp_sock_handler() actually handles several different types of fd events. This includes timerfds that aren't sockets at all. The handling of these has essentially nothing in common with the other cases. So, give the TCP timers there own epoll_type value and dispatch directly to their handler. This also means we can remove the timer field from tcp_epoll_ref, the information it encoded is now implicit in the epoll_type value. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	epoll: Tiny cleanup to udp_sock_handler()	David Gibson	2023-08-13	1	-2/+1
\| \| \| \| \| \| \| \| \|	Move the test for c->no_udp into the function itself, rather than in the dispatching switch statement to better localize the UDP specific logic, and make for greated consistency with other handler functions. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	epoll: Split handling of ICMP and ICMPv6 sockets	David Gibson	2023-08-13	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have different epoll type values for ICMP and ICMPv6 sockets, but they both call the same handler function, icmp_sock_handler(). However that function does essentially nothing in common for the two cases. So, split it into icmp_sock_handler() and icmpv6_sock_handler() and dispatch them separately from the top level. While we're there remove some parameters that the function was never using anyway. Also move the test for c->no_icmp into the functions, so that all the logic specific to ICMP is within the handler, rather than in the top level dispatch code. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	epoll: Fold sock_handler into general switch on epoll event fd	David Gibson	2023-08-13	1	-27/+27
\| \| \| \| \| \| \| \| \| \| \| \|	When we process events from epoll_wait(), we check for a number of special cases before calling sock_handler() which then dispatches based on the protocol type of the socket in the event. Fold these cases together into a single switch on the fd type recorded in the epoll data field. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	epoll: Always use epoll_ref for the epoll data variable	David Gibson	2023-08-13	1	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	epoll_ref contains a variety of information useful when handling epoll events on our sockets, and we place it in the epoll_event data field returned by epoll. However, for a few other things we use the 'fd' field in the standard union of types for that data field. This actually introduces a bug which is vanishingly unlikely to hit in practice, but very nasty if it ever did: theoretically if we had a very large file descriptor number for fd_tap or fd_tap_listen it could overflow into bits that overlap with the 'proto' field in epoll_ref. With some very bad luck this could mean that we mistakenly think an event on a regular socket is an event on fd_tap or fd_tap_listen. More practically, using different (but overlapping) fields of the epoll_data means we can't unify dispatch for the various different objects in the epoll. Therefore use the same epoll_ref as the data for the tap fds and the netns quit fd, adding new fd type values to describe them. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	epoll: Generalize epoll_ref to cover things other than sockets	David Gibson	2023-08-13	1	-13/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The epoll_ref type includes fields for the IP protocol of a socket, and the socket fd. However, we already have a few things in the epoll which aren't protocol sockets, and we may have more in future. Rename these fields to an abstract "fd type" and file descriptor for more generality. Similarly, rather than using existing IP protocol numbers for the type, introduce our own number space. For now these just correspond to the supported protocols, but we'll expand on that in future. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	Use C11 anonymous members to make poll refs less verbose to use	David Gibson	2023-08-04	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \|	union epoll_ref has a deeply nested set of structs and unions to let us subdivide it into the various different fields we want. This means that referencing elements can involve an awkward long string of intermediate fields. Using C11 anonymous structs and unions lets us do this less clumsily. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	passt: Fix error check for signal(), improve error messages	Stefano Brivio	2023-04-13	1	-2/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Valtteri reports that if SIGPIPE already has a disposition set by the parent process, such as systemd with the default setting of IgnoreSIGPIPE=yes, signal() will return the previous value, not zero, and this is not an error: check for SIG_ERR instead. While at it, split messages for failures of sigaction() and signal(), and report the actual error. Reported-by: Valtteri Vuorikoski <vuori@notcom.org> Fixes: 8534be076c73 ("Catch failures when installing signal handlers") Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	passt: Relicense to GPL 2.0, or any later version	Stefano Brivio	2023-04-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In practical terms, passt doesn't benefit from the additional protection offered by the AGPL over the GPL, because it's not suitable to be executed over a computer network. Further, restricting the distribution under the version 3 of the GPL wouldn't provide any practical advantage either, as long as the passt codebase is concerned, and might cause unnecessary compatibility dilemmas. Change licensing terms to the GNU General Public License Version 2, or any later version, with written permission from all current and past contributors, namely: myself, David Gibson, Laine Stump, Andrea Bolognani, Paul Holzinger, Richard W.M. Jones, Chris Kuhn, Florian Weimer, Giuseppe Scrivano, Stefan Hajnoczi, and Vasiliy Ulyanov. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	treewide: Fix header includes to build with musl	Chris Kuhn	2023-03-09	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Roughly inspired from a patch by Chris Kuhn: fix up includes so that we can build against musl: glibc is more lenient as headers generally include a larger amount of other headers. Compared to the original patch, I only included what was needed directly in C files, instead of adding blanket includes in local header files. It's a bit more involved, but more consistent with the current (not ideal) situation. Reported-by: Chris Kuhn <kuhnchris+github@kuhnchris.eu> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	conf, passt: Rename stderr to force_stderr	Chris Kuhn	2023-03-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	While building against musl, gcc informs us that 'stderr' is a protected keyword. This probably comes from a #define stderr (stderr) in musl's stdio.h, to avoid a clash with extern FILE *const stderr, but I didn't really track it down. Just rename it to force_stderr, it makes more sense. [sbrivio: Added commit message] Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>