aboutgitcodebugslistschat
path: root/netlink.c
Commit message (Collapse)AuthorAgeFilesLines
* netlink: Split nl_route() into separate operation functionsDavid Gibson2023-08-041-89/+148
| | | | | | | | | | nl_route() can perform 3 quite different operations based on the 'op' parameter. Split this into separate functions for each one. This requires more lines of code, but makes the internal logic of each operation much easier to follow. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* netlink: Split nl_addr() into separate operation functionsDavid Gibson2023-08-041-93/+139
| | | | | | | | | | | | | | | nl_addr() can perform three quite different operations based on the 'op' parameter, each of which uses a different subset of the parameters. Split them up into a function for each operation. This does use more lines of code, but the overlap wasn't that great, and the separated logic is much easier to follow. It's also clearer in the callers what we expect the netlink operations to do, and what information it uses. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Minor formatting fixes in pasta_ns_conf()] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* netlink: Split up functionality of nl_link()David Gibson2023-08-041-59/+82
| | | | | | | | | | | | | | | | | | | | | | | nl_link() performs a number of functions: it can bring links up, set MAC address and MTU and also retrieve the existing MAC. This makes for a small number of lines of code, but high conceptual complexity: it's quite hard to follow what's going on both in nl_link() itself and it's also not very obvious which function its callers are intending to use. Clarify this, by splitting nl_link() into nl_link_up(), nl_link_set_mac(), and nl_link_get_mac(). The first brings up a link, optionally setting the MTU, the others get or set the MAC address. This fixes an arguable bug in pasta_ns_conf(): it looks as though that was intended to retrieve the guest MAC whether or not c->pasta_conf_ns is set. However, it only actually does so in the !c->pasta_conf_ns case: the fact that we set up==1 means we would only ever set, never get, the MAC in the nl_link() call in the other path. We get away with this because the MAC will quickly be discovered once we receive packets on the tap interface. Still, it's neater to always get the MAC address here. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* netlink: Use correct interface index in NL_SET mode2023_06_27.289301bDavid Gibson2023-06-271-2/+2
| | | | | | | | | | | | | | | | | | nl_addr() and nl_route() take an 'op' selector which affects a number of parameters to the netlink call. Unfortunately when we introduced this option a bug was introduced so that we always use the interface index for the host side, rather than the one for the pasta namespace. Really, the entire interface to nl_addr() and nl_route() is pretty bad: it's tightly coupled with the use cases of its callers. This is a minimal fix which doesn't address that, but also doesn't make it significantly worse. Link: https://bugs.passt.top/show_bug.cgi?id=59 Fixes: 2fe046185634 ("netlink: Add functionality to copy routes from outer namespace") Fixes: e89da3cf03b2 ("netlink: Add functionality to copy addresses from outer namespace") Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* netlink: Add functionality to copy addresses from outer namespaceStefano Brivio2023-05-231-15/+47
| | | | | | | | | | | | | | | | | | | | Similarly to what we've just done with routes, support NL_DUP for addresses (currently not exposed): nl_addr() can optionally copy mulitple addresses to the target namespace, by fixing up data from the dump with appropriate flags and interface index, and repeating it back to the kernel on the socket opened in the target namespace. Link-local addresses are not copied: the family is set to AF_UNSPEC, which means the kernel will ignore them. Same for addresses from a mismatching address (pre-4.19 kernels without support for NETLINK_GET_STRICT_CHK). Ignore IFA_LABEL attributes by changing their type to IFA_UNSPEC, because in general they will report mismatching names, and we don't really need to use labels as we already know the interface index. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* netlink: Add functionality to copy routes from outer namespaceStefano Brivio2023-05-231-17/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of just fetching the default gateway and configuring a single equivalent route in the target namespace, on 'pasta --config-net', it might be desirable in some cases to copy the whole set of routes corresponding to a given output interface. For instance, in: https://github.com/containers/podman/issues/18539 IPv4 Default Route Does Not Propagate to Pasta Containers on Hetzner VPSes configuring the default gateway won't work without a gateway-less route (specifying the output interface only), because the default gateway is, somewhat dubiously, not on the same subnet as the container. This is a similar case to the one covered by commit 7656a6f88882 ("conf: Adjust netmask on mismatch between IPv4 address/netmask and gateway"), and I'm not exactly proud of that workaround. We also have: https://bugs.passt.top/show_bug.cgi?id=49 pasta does not work with tap-style interface for which, eventually, we should be able to configure a gateway-less route in the target namespace. Introduce different operation modes for nl_route(), including a new NL_DUP one, not exposed yet, which simply parrots back to the kernel the route dump for a given interface from the outer namespace, fixing up flags and interface indices on the way, and requesting to add the same routes in the target namespace, on the interface we manage. For n routes we want to duplicate, send n identical netlink requests including the full dump: routes might depend on each other and the kernel processes RTM_NEWROUTE messages sequentially, not atomically, and repeating the full dump naturally resolves dependencies without the need to actually calculate them. I'm not kidding, it actually works pretty well. Link: https://github.com/containers/podman/issues/18539 Link: https://bugs.passt.top/show_bug.cgi?id=49 Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* netlink: Fix comment about response buffer size for nl_req()Stefano Brivio2023-05-231-1/+1
| | | | | | Fixes: fde8004ab0b4 ("netlink: Use 8 KiB * netlink message header size as response buffer") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* passt: Relicense to GPL 2.0, or any later versionStefano Brivio2023-04-061-1/+1
| | | | | | | | | | | | | | | | | | | In practical terms, passt doesn't benefit from the additional protection offered by the AGPL over the GPL, because it's not suitable to be executed over a computer network. Further, restricting the distribution under the version 3 of the GPL wouldn't provide any practical advantage either, as long as the passt codebase is concerned, and might cause unnecessary compatibility dilemmas. Change licensing terms to the GNU General Public License Version 2, or any later version, with written permission from all current and past contributors, namely: myself, David Gibson, Laine Stump, Andrea Bolognani, Paul Holzinger, Richard W.M. Jones, Chris Kuhn, Florian Weimer, Giuseppe Scrivano, Stefan Hajnoczi, and Vasiliy Ulyanov. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* Work around weird false positives with cppcheck-2.9.1David Gibson2023-03-211-1/+1
| | | | | | | | | | | | | | | | | | Commit 89e38f55 "treewide: Fix header includes to build with musl" added extra #includes to work with musl. Unfortunately with the cppcheck version I'm using (cppcheck-2.9-1.fc37.x86_64 in Fedora 37) this causes weird false positives: specifically cppcheck seems to hit a #error in <bits/unistd.h> complaining about including it directly instead of via <unistd.h> (which is not something we're doing). I have no idea why that would be happening; but I'm guessing it has to be a bug in the cpp implementation in that cppcheck version. In any case, it's possible to work around this by moving the include of <unistd.h> before the include of <signal.h>. So, do that. Fixes: 89e38f55405d ("treewide: Fix header includes to build with musl") Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* treewide: Fix header includes to build with muslChris Kuhn2023-03-091-0/+1
| | | | | | | | | | | | | | | Roughly inspired from a patch by Chris Kuhn: fix up includes so that we can build against musl: glibc is more lenient as headers generally include a larger amount of other headers. Compared to the original patch, I only included what was needed directly in C files, instead of adding blanket includes in local header files. It's a bit more involved, but more consistent with the current (not ideal) situation. Reported-by: Chris Kuhn <kuhnchris+github@kuhnchris.eu> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* netlink: Use 8 KiB * netlink message header size as response bufferStefano Brivio2023-03-091-6/+9
| | | | | | | | | | | | | | | | | | | ...instead of BUFSIZ. On musl, BUFSIZ is 1024, so we'll typically truncate the response to the request we send in nl_link(). It's usually 8192 or more with glibc. There doesn't seem to be any macro defining the rtnetlink maximum message size, and iproute2 just hardcodes 1024 * 1024 for the receive buffer, but the example in netlink(7) makes somewhat sense, looking at the kernel implementation. It's not very clean, but we're very unlikely to hit that limit, and if we do, we'll find out painlessly, because NLA_OK() will tell us right away. Reported-by: Chris Kuhn <kuhnchris+passt@kuhnchris.eu> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* convert all remaining err() followed by exit() to die()Laine Stump2023-02-161-2/+1
| | | | | | | | This actually leaves us with 0 uses of err(), but someone could want to use it in the future, so we may as well leave it around. Signed-off-by: Laine Stump <laine@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* pasta: do not leak netlink sock into childPaul Holzinger2023-02-121-2/+2
| | | | | | | | | | | | | | When spawning a child command with pasta command... pasta should not leak fds that it opened. Only the fds that were already open should be given to the child. Run `pasta --config-net -- ls -l /proc/self/fd` from a terminal where only stdin/out/err are open. The fd 3 was opend by ls to read the /proc/self/fd dir. But fd 5 is the netlink socket that was opend in pasta. To prevent such a leak we will open the socket with SOCK_CLOEXEC. Signed-off-by: Paul Holzinger <pholzing@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* conf: Bind inbound ports with CAP_NET_BIND_SERVICE before isolate_user()Stefano Brivio2022-10-151-19/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Even if CAP_NET_BIND_SERVICE is granted, we'll lose the capability in the target user namespace as we isolate the process, which means we're unable to bind to low ports at that point. Bind inbound ports, and only those, before isolate_user(). Keep the handling of outbound ports (for pasta mode only) after the setup of the namespace, because that's where we'll bind them. To this end, initialise the netlink socket for the init namespace before isolate_user() as well, as we actually need to know the addresses of the upstream interface before binding ports, in case they're not explicitly passed by the user. As we now call nl_sock_init() twice, checking its return code from conf() twice looks a bit heavy: make it exit(), instead, as we can't do much if we don't have netlink sockets. While at it: - move the v4_only && v6_only options check just after the first option processing loop, as this is more strictly related to option parsing proper - update the man page, explaining that CAP_NET_BIND_SERVICE is *not* the preferred way to bind ports, because passt and pasta can be abused to allow other processes to make effective usage of it. Add a note about the recommended sysctl instead - simplify nl_sock_init_do() now that it's called once for each case Reported-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* netlink: Disable duplicate address detection for configured IPv6 addressStefano Brivio2022-10-151-0/+3
| | | | | | | | | | | | | | | | | | With default options, when we pass --config-net, the IPv6 address is actually going to be recycled from the init namespace, so it is in fact duplicated, but duplicate address detection has no way to find out. With a different configured address, that's not the case, but anyway duplicate address detection will be unable to see this. In both cases, we're wasting time for nothing. Pass the IFA_F_NODAD flag as we configure globally scoped IPv6 addresses via netlink. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* Move logging functions to a new file, log.cStefano Brivio2022-10-141-0/+1
| | | | | | | | Logging to file is going to add some further complexity that we don't want to squeeze into util.c. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* Avoid ugly 'end' members in netlink structuresDavid Gibson2022-09-291-9/+10
| | | | | | | | | | | | | | We use a number of complex structures to format messages to send to netlink. In some cases we add imaginary 'end' members not because they actually mean something on the wire, but so that we can use offsetof() on the member to determine the relevant size. Adding extra things to the structures for this is kinda nasty. We can use a different construct with offsetof and sizeof to avoid them. As a bonus this removes some cppcheck warnings about unused struct members. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* cppcheck: Reduce scope of some variablesDavid Gibson2022-09-291-2/+1
| | | | | | | Minor style improvement suggested by cppcheck. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* Separately locate external interfaces for IPv4 and IPv6David Gibson2022-07-301-72/+7
| | | | | | | | | | | | | | | | | Now that the back end allows passt/pasta to use different external interfaces for IPv4 and IPv6, use that to do the right thing in the case that the host has IPv4 and IPv6 connectivity via different interfaces. If the user hasn't explicitly chosen an interface, separately search for a suitable external interface for each protocol. As a bonus, this substantially simplifies the external interface probe. It also eliminates a subtle confusing case where in some circumstances we would pick the first interface in interface index order, and sometimes in order of routes returned from netlink. On some network configurations that could cause tests to fail, because the logic in the tests was subtly different (it always used route order). Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* netlink: In nl_addr() and nl_route(), don't return before set requestStefano Brivio2022-05-011-2/+2
| | | | | Fixes: 22ed4467a413 ("treewide: Unchecked return value from library, CWE-252") Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* treewide: Unchecked return value from library, CWE-252Stefano Brivio2022-04-071-15/+25
| | | | | | | All instances were harmless, but it might be useful to have some debug messages here and there. Reported by Coverity. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* treewide: Mark constant references as constStefano Brivio2022-03-291-2/+2
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* netlink: Avoid left-over bytes in request on MTU configurationStefano Brivio2022-02-231-4/+7
| | | | | | | | | When nl_link() configures the MTU, it shouldn't send extra bytes, otherwise we'll get a kernel warning: netlink: 4 bytes leftover after parsing attributes in process `pasta'. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* netlink: Fix swapped v4/v6-only flags in external interface detectionStefano Brivio2022-01-261-2/+2
| | | | | | | The effect of this typo became visible in an IPv6-only environment, where passt wouldn't work at all. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp, netlink, HAS{BYTES_ACKED,MIN_RTT,GETRANDOM} and NETLINK_GET_STRICT_CHKStefano Brivio2022-01-261-2/+7
| | | | | | | | | | | | | | | | tcpi_bytes_acked and tcpi_min_rtt are only available on recent kernel versions: provide fall-back paths (incurring some grade of performance penalty). Support for getrandom() was introduced in Linux 3.17 and glibc 2.25: provide an alternate mechanism for that as well, reading from /dev/random. Also check if NETLINK_GET_STRICT_CHK is defined before using it: it's not strictly needed, we'll filter out irrelevant results from netlink anyway. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Drop <linux/ipv6.h> include, carry own ipv6hdr and opt_hdr definitionsStefano Brivio2022-01-261-1/+0
| | | | | | | This is the only remaining Linux-specific include -- drop it to avoid clang-tidy warnings and to make code more portable. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* netlink: Fix length of address attributeStefano Brivio2021-10-211-2/+2
| | | | | | | | ...I broke this while playing with clang-tidy, and didn't add tests for pasta's --config-net yet. Reported-by: Giuseppe Scrivano <gscrivan@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Add cppcheck target, test, and address resulting warningsStefano Brivio2021-10-211-5/+3
| | | | | | | ...mostly false positives, but a number of very relevant ones too, in tcp_get_sndbuf(), tcp_conn_from_tap(), and siphash PREAMBLE(). Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Fix build with gcc 7, use std=c99, enable some more Clang checkersStefano Brivio2021-10-211-36/+45
| | | | | | | | | | | | | | Unions and structs, you all have names now. Take the chance to enable bugprone-reserved-identifier, cert-dcl37-c, and cert-dcl51-cpp checkers in clang-tidy. Provide a ffsl() weak declaration using gcc built-in. Start reordering includes, but that's not enough for the llvm-include-order checker yet. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Add clang-tidy Makefile target and test, take care of warningsStefano Brivio2021-10-201-15/+18
| | | | | | | Most are just about style and form, but a few were actually serious mistakes (NDP-related). Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* netlink: NETLINK_GET_STRICT_CHK is not available on older kernelsStefano Brivio2021-10-191-3/+6
| | | | | | | For example on 4.19. Don't fail if we can't set it, filter on interface index in nl_addr(). Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* netlink, conf: Actually get prefix/mask lengthStefano Brivio2021-10-191-5/+8
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* netlink: Bring up interface even if neither MTU nor MAC address is configuredStefano Brivio2021-10-141-0/+5
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* netlink, pasta: Configure MTU of tap interface on --config-netStefano Brivio2021-10-141-16/+28
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* conf, tap: Split netlink and pasta functions, allow interface configurationStefano Brivio2021-10-141-0/+514
Move netlink routines to their own file, and use netlink to configure or fetch all the information we need, except for the TUNSETIFF ioctl. Move pasta-specific functions to their own file as well, add parameters and calls to configure the tap interface in the namespace. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>