aboutgitcodebugslistschat
path: root/udp.c
Commit message (Collapse)AuthorAgeFilesLines
...
* Use typing to reduce chances of IPv4 endianness errorsDavid Gibson2022-11-041-15/+15
| | | | | | | | | | | | | | | | | | | We recently corrected some errors handling the endianness of IPv4 addresses. These are very easy errors to make since although we mostly store them in network endianness, we sometimes need to manipulate them in host endianness. To reduce the chances of making such mistakes again, change to always using a (struct in_addr) instead of a bare in_addr_t or uint32_t to store network endian addresses. This makes it harder to accidentally do arithmetic or comparisons on such addresses as if they were host endian. We introduce a number of IN4_IS_ADDR_*() helpers to make it easier to directly work with struct in_addr values. This has the additional benefit of making the IPv4 and IPv6 paths more visually similar. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* Use IPV4_IS_LOOPBACK more widelyDavid Gibson2022-11-041-1/+1
| | | | | | | | | | | | | This macro checks if an IPv4 address is in the loopback network (127.0.0.0/8). There are two places where we open code an identical check, use the macro instead. There are also a number of places we specifically exclude the loopback address (127.0.0.1), but we should actually be excluding anything in the loopback network. Change those sites to use the macro as well. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* udp: Fix port and address checks for DNS forwarderStefano Brivio2022-10-151-3/+3
| | | | | | | | | | | | | | | | | | | First off, as we swap endianness for source ports in udp_fill_data_v{4,6}(), we want host endianness, not network endianness. It doesn't actually matter if we use htons() or ntohs() here, but the current version is confusing. In the IPv4 path, when we remap DNS answers, we already swapped the endianness as needed for the source port: don't swap it again, otherwise we'll not map DNS answers for IPv4. In the IPv6 path, when we remap DNS answers, we want to check that they came from our upstream DNS server, not the one configured via --dns-forward (which doesn't even need to exist for this functionality to work). Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* conf, tcp, udp: Allow specification of interface to bind toStefano Brivio2022-10-151-17/+18
| | | | | | | | | | | | | | | | Since kernel version 5.7, commit c427bfec18f2 ("net: core: enable SO_BINDTODEVICE for non-root users"), we can bind sockets to interfaces, if they haven't been bound yet (as in bind()). Introduce an optional interface specification for forwarded ports, prefixed by %, that can be passed together with an address. Reported use case: running local services that use ports we want to have externally forwarded: https://github.com/containers/podman/issues/14425 Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* Move logging functions to a new file, log.cStefano Brivio2022-10-141-0/+1
| | | | | | | | Logging to file is going to add some further complexity that we don't want to squeeze into util.c. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* udp: Replace pragma to ignore bogus stringop-overread warning with workaroundStefano Brivio2022-09-291-8/+18
| | | | | | | | | | | | | | | | | | | Commit c318ffcb4c93 ("udp: Ignore bogus -Wstringop-overread for write() from gcc 12.1") uses a gcc pragma to ignore a bogus warning, which started appearing on gcc 12.1 (aarch64 and x86_64) due to: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103483 ...but gcc 12.2 has the same issue. Not just that: if LTO is enabled, the pragma itself is ignored (this wasn't the case with gcc 12.1), because of: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80922 Drop the pragma, and assign a frame (in the networking sense) pointer from the offset of the Ethernet header in the buffer, then pass it to write() and pcap(), so that gcc doesn't obsess anymore with the fact that an Ethernet header is 14 bytes and we're sending more than that. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* Fix widespread off-by-one error dealing with port numbersDavid Gibson2022-09-241-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | Port numbers (for both TCP and UDP) are 16-bit, and so fit exactly into a 'short'. USHRT_MAX is therefore the maximum port number and this is widely used in the code. Unfortunately, a lot of those places don't actually want the maximum port number (USHRT_MAX == 65535), they want the total number of ports (65536). This leads to a number of potentially nasty consequences: * We have buffer overruns on the port_fwd::delta array if we try to use port 65535 * We have similar potential overruns for the tcp_sock_* arrays * Interestingly udp_act had the correct size, but we can calculate it in a more direct manner * We have a logical overrun of the ports bitmap as well, although it will just use an unused bit in the last byte so isnt harmful * Many loops don't consider port 65535 (which does mitigate some but not all of the buffer overruns above) * In udp_invert_portmap() we incorrectly compute the reverse port translation for return packets Correct all these by using a new NUM_PORTS defined explicitly for this purpose. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* Treat port numbers as unsignedDavid Gibson2022-09-241-1/+1
| | | | | | | | | | | Port numbers are unsigned values, but we're storing them in (signed) int variables in some places. This isn't actually harmful, because int is large enough to hold the entire range of ports. However in places we don't want to use an in_port_t (usually to avoid overflow on the last iteration of a loop) it makes more conceptual sense to use an unsigned int. This will also avoid some problems with later cleanups. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* Don't use indirect remap functions for conf_ports()David Gibson2022-09-241-22/+0
| | | | | | | | | | Now that we've delayed initialization of the UDP specific "reverse" map until udp_init(), the only difference between the various 'remap' functions used in conf_ports() is which array they target. So, simplify by open coding the logic into conf_ports() with a pointer to the correct mapping array. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* udp: Delay initialization of UDP reversed port mapping tableDavid Gibson2022-09-241-3/+22
| | | | | | | | | | | | | | | Because it's connectionless, when mapping UDP ports we need, in addition to the table of deltas for destination ports needed by TCP, we need an inverted table to translate the source ports on return packets. Currently we fill out the inverted table at the same time we construct the main table in udp_remap_to_tap() and udp_remap_to_init(). However, we don't use either table until after we've initialized UDP, so we can delay the construction of the reverse table to udp_init(). This makes the configuration more symmetric between TCP and UDP which will enable further cleanups. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* Consolidate port forwarding configuration into a common structureDavid Gibson2022-09-241-17/+13
| | | | | | | | | | | | | | | The configuration for how to forward ports in and out of the guest/ns is divided between several different variables. For each connect direction and protocol we have a mode in the udp/tcp context structure, a bitmap of which ports to forward also in the context structure and an array of deltas to apply if the outward facing and inward facing port numbers are different. This last is a separate global variable, rather than being in the context structure, for no particular reason. UDP also requires an additional array which has the reverse mapping used for return packets. Consolidate these into a re-used substructure in the context structure. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* udp: Don't drop zero-length outbound UDP packetsDavid Gibson2022-09-131-7/+10
| | | | | | | | | | | | | udp_tap_handler() currently skips outbound packets if they have a payload length of zero. This is not correct, since in a datagram protocol zero length packets still have meaning. Adjust this to correctly forward the zero-length packets by using a msghdr with msg_iovlen == 0. Bugzilla: https://bugs.passt.top/show_bug.cgi?id=19 Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* udp: Don't pre-initialize msghdr arrayDavid Gibson2022-09-131-1/+5
| | | | | | | | | | | | In udp_tap_handler() the array of msghdr structures, mm[], is initialized to zero. Since UIO_MAXIOV is 1024, this can be quite a large zero, which is expensive if we only end up using a few of its entries. It also makes it less obvious how we're setting all the control fields at the point we actually invoke sendmmsg(). Rather than pre-initializing it, just initialize each element as we use it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* Make substructures for IPv4 and IPv6 specific context informationDavid Gibson2022-07-301-31/+31
| | | | | | | | | | | | The context structure contains a batch of fields specific to IPv4 and to IPv6 connectivity. Split those out into a sub-structure. This allows the conf_ip4() and conf_ip6() functions, which take the entire context but touch very little of it, to be given more specific parameters, making it clearer what it affects without stepping through the code. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* Separate IPv4 and IPv6 configurationDavid Gibson2022-07-301-4/+4
| | | | | | | | | | | | | | | | | | | After recent changes, conf_ip() now has essentially entirely disjoint paths for IPv4 and IPv6 configuration. So, it's cleaner to split them out into different functions conf_ip4() and conf_ip6(). Splitting these out also lets us make the interface a bit nicer, having them return success or failure directly, rather than manipulating c->v4 and c->v6 to indicate success/failure of the two versions. Since these functions may also initialize the interface index for each protocol, it turns out we can then drop c->v4 and c->v6 entirely, replacing tests on those with tests on whether c->ifi4 or c->ifi6 is non-zero (since a 0 interface index is never valid). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Whitespace fixes] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* udp: Ignore bogus -Wstringop-overread for write() from gcc 12.1Stefano Brivio2022-05-191-0/+4
| | | | | | | | | | | | | | | | | With current OpenSUSE Tumbleweed on aarch64 (gcc-12-1.3.aarch64) and on x86_64 (gcc-12-1.4.x86_64), but curiously not on armv7hl (gcc-12-1.3.armv7hl), gcc warns about using the _pointer_ to the 802.3 header to write the whole frame to the tap descriptor: reading between 62 and 4294967357 bytes from a region of size 14 which is bogus: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103483 Probably declaring udp_sock_fill_data_v{4,6}() as noinline would "fix" this, but that's on the data path, so I'd rather not. Use a gcc pragma instead. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* conf, tcp, udp: Allow address specification for forwarded portsStefano Brivio2022-05-011-63/+99
| | | | | | | | | | | | | This feature is available in slirp4netns but was missing in passt and pasta. Given that we don't do dynamic memory allocation, we need to bind sockets while parsing port configuration. This means we need to process all other options first, as they might affect addressing and IP version support. It also implies a minor rework of how TCP and UDP implementations bind sockets. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* udp: Out-of-bounds read, CWE-125 in udp_timer()Stefano Brivio2022-04-071-1/+1
| | | | | | | Not an actual issue due to how it's typically stored, but udp_act can also be used for ports 65528-65535. Reported by Coverity. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* treewide: Unchecked return value from library, CWE-252Stefano Brivio2022-04-071-1/+2
| | | | | | | All instances were harmless, but it might be useful to have some debug messages here and there. Reported by Coverity. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tap, tcp, udp, icmp: Cut down on some oversized buffersStefano Brivio2022-03-291-2/+2
| | | | | | | | | The existing sizes provide no measurable differences in throughput and packet rates at this point. They were probably needed as batched implementations were not complete, but they can be decreased quite a bit now. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* udp: Move flags before ts in struct udp_tap_port, avoid end paddingStefano Brivio2022-03-291-3/+3
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* treewide: Mark constant references as constStefano Brivio2022-03-291-16/+18
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* treewide: Packet abstraction with mandatory boundary checksStefano Brivio2022-03-291-20/+26
| | | | | | | | | | | | | | | | | | | | Implement a packet abstraction providing boundary and size checks based on packet descriptors: packets stored in a buffer can be queued into a pool (without storage of its own), and data can be retrieved referring to an index in the pool, specifying offset and length. Checks ensure data is not read outside the boundaries of buffer and descriptors, and that packets added to a pool are within the buffer range with valid offset and indices. This implies a wider rework: usage of the "queueing" part of the abstraction mostly affects tap_handler_{passt,pasta}() functions and their callees, while the "fetching" part affects all the guest or tap facing implementations: TCP, UDP, ICMP, ARP, NDP, DHCP and DHCPv6 handlers. Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp, udp, util: Enforce 24-bit limit on socket numbersStefano Brivio2022-03-291-0/+7
| | | | | | | This should never happen, but there are no formal guarantees: ensure socket numbers are below SOCKET_MAX. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* udp: Use flags for local, loopback, and configured unicast bindsStefano Brivio2022-03-281-25/+23
| | | | | | | | | | | There's no value in keeping a separate timestamp for activity and for aging of local binds, given that they have the same timeout. Reduce that to a single timestamp, with a flag indicating the local bind. Also use flags instead of separate int fields for loopback and configured unicast address usage as source address. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* udp: Split buffer queueing/writing parts of udp_sock_handler()Stefano Brivio2022-03-281-171/+193
| | | | | | | | | | ...it became too hard to follow: split it off to udp_sock_fill_data_v{4,6}. While at it, use IN6_ARE_ADDR_EQUAL(a, b), courtesy of netinet/in.h, instead of open-coded memcmp(). Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* udp: Drop _splice from recv, send, sendto static buffer namesStefano Brivio2022-03-281-29/+23
| | | | | | | It's already implied by the fact they don't have "l2" in their names, and dropping it improves readability a bit. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* udp: Explicitly initialise sin6_scope_id and sin_zero in sockaddr_in{,6}Stefano Brivio2022-02-251-0/+2
| | | | | | | Not functionally needed, but gcc versions 7 to 9 (at least) will issue a warning otherwise. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp, udp: Receive batching doesn't pay off when writing single frames to tapStefano Brivio2022-02-211-16/+17
| | | | | | | | | | | | | | | | In pasta mode, when we get data from sockets and write it as single frames to the tap device, we batch receive operations considerably, and then (conceptually) split the data in many smaller writes. It looked like an obvious choice, but performance is actually better if we receive data in many small frame-sized recvmsg()/recvmmsg(). The syscall overhead with the previous behaviour, observed by perf, comes predominantly from write operations, but receiving data in shorter chunks probably improves cache locality by a considerable amount. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* udp: Allow loopback connections from host using configured unicast addressStefano Brivio2022-02-211-2/+18
| | | | | | | | | | | | Likely for testing purposes only: allow connections from host to guest or namespace using, as connection target, the configured, possibly global unicast address. In this case, we have to map the destination address to a link-local address, and for port-based tracked responses, the source address needs to be again the unicast address: not loopback, not link-local. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* conf, udp: Introduce basic DNS forwardingStefano Brivio2022-02-211-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | For compatibility with libslirp/slirp4netns users: introduce a mechanism to map, in the UDP routines, an address facing guest or namespace to the first IPv4 or IPv6 address resulting from configuration as resolver. This can be enabled with the new --dns-forward option. This implies that sourcing and using DNS addresses and search lists, passed via command line or read from /etc/resolv.conf, is not bound anymore to DHCP/DHCPv6/NDP usage: for example, pasta users might just want to use addresses from /etc/resolv.conf as mapping target, while not passing DNS options via DHCP. Reflect this in all the involved code paths by differentiating DHCP/DHCPv6/NDP usage from DNS configuration per se, and in the new options --dhcp-dns, --dhcp-search for pasta, and --no-dhcp-dns, --no-dhcp-search for passt. This should be the last bit to enable substantial compatibility between slirp4netns.sh and slirp4netns(1): pass the --dns-forward option from the script too. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt, pasta: Namespace-based sandboxing, defer seccomp policy applicationStefano Brivio2022-02-211-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To reach (at least) a conceptually equivalent security level as implemented by --enable-sandbox in slirp4netns, we need to create a new mount namespace and pivot_root() into a new (empty) mountpoint, so that passt and pasta can't access any filesystem resource after initialisation. While at it, also detach IPC, PID (only for passt, to prevent vulnerabilities based on the knowledge of a target PID), and UTS namespaces. With this approach, if we apply the seccomp filters right after the configuration step, the number of allowed syscalls grows further. To prevent this, defer the application of seccomp policies after the initialisation phase, before the main loop, that's where we expect bad things to happen, potentially. This way, we get back to 22 allowed syscalls for passt and 34 for pasta, on x86_64. While at it, move #syscalls notes to specific code paths wherever it conceptually makes sense. We have to open all the file handles we'll ever need before sandboxing: - the packet capture file can only be opened once, drop instance numbers from the default path and use the (pre-sandbox) PID instead - /proc/net/tcp{,v6} and /proc/net/udp{,v6}, for automatic detection of bound ports in pasta mode, are now opened only once, before sandboxing, and their handles are stored in the execution context - the UNIX domain socket for passt is also bound only once, before sandboxing: to reject clients after the first one, instead of closing the listening socket, keep it open, accept and immediately discard new connection if we already have a valid one Clarify the (unchanged) behaviour for --netns-only in the man page. To actually make passt and pasta processes run in a separate PID namespace, we need to unshare(CLONE_NEWPID) before forking to background (if configured to do so). Introduce a small daemon() implementation, __daemon(), that additionally saves the PID file before forking. While running in foreground, the process itself can't move to a new PID namespace (a process can't change the notion of its own PID): mention that in the man page. For some reason, fork() in a detached PID namespace causes SIGTERM and SIGQUIT to be ignored, even if the handler is still reported as SIG_DFL: add a signal handler that just exits. We can now drop most of the pasta_child_handler() implementation, that took care of terminating all processes running in the same namespace, if pasta started a shell: the shell itself is now the init process in that namespace, and all children will terminate once the init process exits. Issuing 'echo $$' in a detached PID namespace won't return the actual namespace PID as seen from the init namespace: adapt demo and test setup scripts to reflect that. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp, udp, util: Fixes for bitmap handling on big-endian, castsStefano Brivio2022-01-261-2/+2
| | | | | | | | Bitmap manipulating functions would otherwise refer to inconsistent sets of bits on big-endian architectures. While at it, fix up a couple of casts. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Drop <linux/ipv6.h> include, carry own ipv6hdr and opt_hdr definitionsStefano Brivio2022-01-261-2/+0
| | | | | | | This is the only remaining Linux-specific include -- drop it to avoid clang-tidy warnings and to make code more portable. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Add cppcheck target, test, and address resulting warningsStefano Brivio2021-10-211-10/+8
| | | | | | | ...mostly false positives, but a number of very relevant ones too, in tcp_get_sndbuf(), tcp_conn_from_tap(), and siphash PREAMBLE(). Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* udp: Avoid static initialiser for udp{4,6}_l2_bufStefano Brivio2021-10-211-18/+23
| | | | | | With the new UDP_TAP_FRAMES value, the binary size grows considerably. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* udp: Fix maximum payload size calculation for IPv4 buffers, bump UDP_TAP_FRAMESStefano Brivio2021-10-211-2/+3
| | | | | | | | The issue with a higher UDP_TAP_FRAMES was actually coming from a payload size the guest couldn't digest. Fix that, and bump UDP_TAP_FRAMES back to 128. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Fix build with gcc 7, use std=c99, enable some more Clang checkersStefano Brivio2021-10-211-43/+46
| | | | | | | | | | | | | | Unions and structs, you all have names now. Take the chance to enable bugprone-reserved-identifier, cert-dcl37-c, and cert-dcl51-cpp checkers in clang-tidy. Provide a ffsl() weak declaration using gcc built-in. Start reordering includes, but that's not enough for the llvm-include-order checker yet. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* ndp, dhcpv6, tcp, udp: Always use link-local as source if gateway isn'tStefano Brivio2021-10-201-0/+5
| | | | | | | | | | | | This shouldn't happen on any sane configuration, but I just met an example of that: the default IPv6 gateway on the host is configured with a global unicast address, we use that as source for RA, DHCPv6 replies, and the guest ignores it. Same later on if we talk TCP or UDP and the guest has no idea where that address comes from. Use our link-local address in case the gateway address is global. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Add clang-tidy Makefile target and test, take care of warningsStefano Brivio2021-10-201-9/+9
| | | | | | | Most are just about style and form, but a few were actually serious mistakes (NDP-related). Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Address warnings from Clang's scan-buildStefano Brivio2021-10-201-4/+11
| | | | | | All false positives so far. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Address gcc 11 warningsStefano Brivio2021-10-201-3/+5
| | | | | | | | | A mix of unchecked return values, a missing permission mask for open(2) with O_CREAT, and some false positives from -Wstringop-overflow and -Wmaybe-uninitialized. Reported-by: Martin Hauke <mardnh@gmx.de> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* udp: drop bogus udp_tap_map ts assignmentStefan Hajnoczi2021-10-151-1/+0
| | | | | | | | | | | | | | | | | | The 'ts' field is a timestamp so assigning the socket file descriptor is incorrect. There is no actual bug because the current time is assigned just a few lines later: udp_tap_map[V4][src].sock = s; udp_tap_map[V4][src].ts = s; ^^^^^^^^^^^ bogus ^^^^^^^^^^ bitmap_set(udp_act[V4][UDP_ACT_TAP], src); } udp_tap_map[V4][src].ts = now->tv_sec; ^^^^^^^^^^^^^^^ correct ^^^^^^^^^^^^^^ Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* conf, tcp, udp: Add --no-map-gw to disable mapping gateway address to hostStefano Brivio2021-10-141-2/+2
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt, pasta: Add seccomp supportStefano Brivio2021-10-141-0/+7
| | | | | | | | | | | | | | | | | | List of allowed syscalls comes from comments in the form: #syscalls <list> for syscalls needed both in passt and pasta mode, and: #syscalls:pasta <list> #syscalls:passt <list> for syscalls specifically needed in pasta or passt mode only. seccomp.sh builds a list of BPF statements from those comments, prefixed by a binary search tree to keep lookup fast. While at it, clean up a bit the Makefile using wildcards. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* pasta: Allow specifying paths and names of namespacesGiuseppe Scrivano2021-10-071-2/+2
| | | | | | | | | | | | | | | | | | Based on a patch from Giuseppe Scrivano, this adds the ability to: - specify paths and names of target namespaces to join, instead of a PID, also for user namespaces, with --userns - request to join or create a network namespace only, without entering or creating a user namespace, with --netns-only - specify the base directory for netns mountpoints, with --nsrun-dir Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> [sbrivio: reworked logic to actually join the given namespaces when they're not created, implemented --netns-only and --nsrun-dir, updated pasta demo script and man page] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tap: Completely de-serialise input message batchesStefano Brivio2021-09-271-4/+9
| | | | | | | | | | | | | Until now, messages would be passed to protocol handlers in a single batch only if they happened to be dequeued in a row. Packets interleaved between different connections would result in multiple calls to the same protocol handler for a single connection. Instead, keep track of incoming packet descriptors, arrange them in sequences, and call protocol handlers only as we completely sorted input messages in batches. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* udp: Switch to new socket message after 32KiB instead of 64KiBStefano Brivio2021-09-271-2/+2
| | | | | | | For some reason, this measurably improves performance with qemu and virtio-net. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* udp: Decrease UDP_TAP_FRAMES to 16Stefano Brivio2021-09-271-1/+1
| | | | | | | Similarly to the decrease in TCP_TAP_FRAMES, this improves fairness, with a very small impact on performance. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* udp: Reset iov_base after sending partial message on sendmmsg() failureStefano Brivio2021-09-091-0/+2
| | | | | | | We set the length while processing messges, but the starting address is pre-initialised. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>