aboutgitcodebugslistschat
path: root/dhcp.c
Commit message (Collapse)AuthorAgeFilesLines
* conf, fwd: Split notion of gateway/router from guest-visible host addressDavid Gibson2024-08-211-4/+6
| | | | | | | | | | | | | | | | | | | | | | | The @gw fields in the ip4_ctx and ip6_ctx give the (host's) default gateway. We use this for two quite distinct things: advertising the gateway that the guest should use (via DHCP, NDP and/or --config-net) and for a limited form of NAT. So that the guest can access services on the host, we map the gateway address within the guest to the loopback address on the host. Using the gateway address for this isn't necessarily the best choice for this purpose, certainly not for all circumstances. So, start off by splitting the notion of these into two different values: @guest_gw which is the gateway address the guest should use and @nat_host_loopback, which is the guest visible address to remap to the host's loopback. Usually nat_host_loopback will have the same value as guest_gw. However when --no-map-gw is specified we leave them unspecified instead. This means when we use nat_host_loopback, we don't need to separately check c->no_map_gw to see if it's relevant. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* fwd: Split notion of "our tap address" from gateway for IPv4David Gibson2024-08-211-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ip4.gw conflates 3 conceptually different things, which (for now) have the same value: 1. The router/gateway address as seen by the guest 2. An address to NAT to the host with --no-map-gw isn't specified 3. An address to use as source when nothing else makes sense Case 3 occurs in two situations: a) for our DHCP responses - since they come from passt internally there's no naturally meaningful address for them to come from b) for forwarded connections coming from an address that isn't guest accessible (localhost or the guest's own address). (b) occurs even with --no-map-gw, and the expected behaviour of forwarding local connections requires it. For IPv6 role (3) is now taken by ip6.our_tap_ll (which usually has the same value as ip6.gw). For future flexibility we may want to make this "address of last resort" different from the gateway address, so split them logically for IPv4 as well. Specifically, add a new ip4.our_tap_addr field for the address with this role, and initialise it to ip4.gw for now. Unlike IPv6 where we can always get a link-local address, we might not be able to get a (non 0.0.0.0) address here (e.g. if the host is disconnected or only has a point to point link with no gateway address). In that case we have to disable forwarding of inbound connections with guest-inaccessible source addresses. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* util: Helper for formatting MAC addressesDavid Gibson2024-08-211-3/+2
| | | | | | | | | There are a couple of places where we somewhat messily open code formatting an Ethernet like MAC address for display. Add an eth_ntop() helper for this. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* treewide: Standardise variable names for various packet lengthsDavid Gibson2024-05-021-3/+3
| | | | | | | | | | | | | | | | | | At various points we need to track the lengths of a packet including or excluding various different sets of headers. We don't always use the same variable names for doing so. Worse in some places we use the same name for different things: e.g. tcp_fill_headers[46]() use ip_len for the length including the IP headers, but then tcp_send_flag() which calls it uses it to mean the IP payload length only. To improve clarity, standardise on these names: dlen: L4 protocol payload length ("data length") l4len: plen + length of L4 protocol header l3len: l4len + length of IPv4/IPv6 header l2len: l3len + length of L2 (ethernet) header Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* util: move IP stuff from util.[ch] to ip.[ch]Laurent Vivier2024-03-061-0/+1
| | | | | | | | | | | | Introduce ip.[ch] file to encapsulate IP protocol handling functions and structures. Modify various files to include the new header ip.h when it's needed. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-ID: <20240303135114.1023026-5-lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* treewide: Make a bunch of pointer variables pointers to constDavid Gibson2024-01-161-4/+5
| | | | | | | | | | Sufficiently recent cppcheck (I'm using 2.13.0) seems to have added another warning for pointer variables which could be pointer to const but aren't. Use this to make a bunch of variables const pointers where they previously weren't for no particular reason. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* dhcp: put option 53 at the beginning2023_10_04.f851084Stas Sergeev2023-10-041-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... unless it is listed in 55. Many clients expect option 53 at the beginning. mTCP has this code: if ( resp->options[0] != 53 ) { TRACE_WARN(( "Dhcp: first option was not a Dhcp msg type\n" )); return; } wattcp32 has this: static int DHCP_is_ack (void) { const BYTE *opt = (const BYTE*) &dhcp_in.dh_opt[4]; return (opt[0] == DHCP_OPT_MSG_TYPE && opt[1] == 1 && opt[2] == DHCP_ACK); } static int DHCP_is_nack (void) { const BYTE *opt = (const BYTE*) &dhcp_in.dh_opt[4]; return (opt[0] == DHCP_OPT_MSG_TYPE && opt[1] == 1 && opt[2] == DHCP_NAK); } Link: https://bugs.passt.top/show_bug.cgi?id=77 Signed-off-by: Stas Sergeev <stsp2@yandex.ru> [sbrivio: s/options 53/option 53/ and s/other/others/ in comment] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* dhcp: Actually note down the length of options received by the clientStefano Brivio2023-09-271-0/+1
| | | | | | | | | | | | | | | It turns out we never used 'clen' until commit 1f24d3efb499 ("dhcp: support BOOTP clients"), and we always ignored option 55 (Parameter Request List), while, according to RFC 2132, we MUST try to insert the requested options in the order requested by the client. The commit mentioned above made this visible because now every client is reported as sending a DHCPREQUEST as an old BOOTP client, based on the lack of option 53 (that is, zero length). Fixes: b439984641ed ("merd: ARP and DHCP handlers, connection tracking fixes") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* dhcp: support BOOTP clientsStas Sergeev2023-09-071-2/+2
| | | | | | | | | | BOOTP clients do not use tagged messages for requests. As such, any message without the DHCP option 53, should be considered a BOOTP request. Link: https://bugs.passt.top/show_bug.cgi?id=72 Signed-off-by: Stas Sergeev <stsp2@yandex.ru> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Relicense to GPL 2.0, or any later versionStefano Brivio2023-04-061-1/+1
| | | | | | | | | | | | | | | | | | | In practical terms, passt doesn't benefit from the additional protection offered by the AGPL over the GPL, because it's not suitable to be executed over a computer network. Further, restricting the distribution under the version 3 of the GPL wouldn't provide any practical advantage either, as long as the passt codebase is concerned, and might cause unnecessary compatibility dilemmas. Change licensing terms to the GNU General Public License Version 2, or any later version, with written permission from all current and past contributors, namely: myself, David Gibson, Laine Stump, Andrea Bolognani, Paul Holzinger, Richard W.M. Jones, Chris Kuhn, Florian Weimer, Giuseppe Scrivano, Stefan Hajnoczi, and Vasiliy Ulyanov. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* dhcp: Fix netmask calculation for option 1 from prefix lengthStefano Brivio2023-02-161-1/+1
| | | | | | | | | | | | | Similar to the conf_print() fix from commit 4129764ecaeb ("conf: Fix mask calculation from prefix_len in conf_print()"): to calculate an IPv4 netmask from the prefix length, we need to left shift 32 all-one bits by 32 minus the prefix length -- not by the prefix length itself. Reported-by: Yalan Zhang <yalzhang@redhat.com> Fixes: dd09cceaee21 ("Minor improvements to IPv4 netmask handling") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* conf, udp: Drop mostly duplicated dns_send arrays, rename related fieldsStefano Brivio2022-11-161-3/+2
| | | | | | | | | | | | | | | | | | | | | Given that we use just the first valid DNS resolver address configured, or read from resolv.conf(5) on the host, to forward DNS queries to, in case --dns-forward is used, we don't need to duplicate dns[] to dns_send[]: - rename dns_send[] back to dns[]: those are the resolvers we advertise to the guest/container - for forwarding purposes, instead of dns[], use a single field (for each protocol version): dns_host - and rename dns_fwd to dns_match, so that it's clear this is the address we are matching DNS queries against, to decide if they need to be forwarded Suggested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* conf: Split the notions of read DNS addresses and offered onesStefano Brivio2022-11-041-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | With --dns-forward, if the host has a loopback address configured as DNS server, we should actually use it to forward queries, but, if --no-map-gw is passed, we shouldn't offer the same address via DHCP, NDP and DHCPv6, because it's not going to be reachable. Problematic configuration: * systemd-resolved configuring the usual 127.0.0.53 on the host: we read that from /etc/resolv.conf * --dns-forward specified with an unrelated address, for example 198.51.100.1 We still want to forward queries to 127.0.0.53, if we receive one directed to 198.51.100.1, so we can't drop 127.0.0.53 from our list: we want to use it for forwarding. At the same time, we shouldn't offer 127.0.0.53 to the guest or container either. With this change, I'm only covering the case of automatically configured DNS servers from /etc/resolv.conf. We could extend this to addresses configured with command-line options, but I don't really see a likely use case at this point. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* Use typing to reduce chances of IPv4 endianness errorsDavid Gibson2022-11-041-4/+7
| | | | | | | | | | | | | | | | | | | We recently corrected some errors handling the endianness of IPv4 addresses. These are very easy errors to make since although we mostly store them in network endianness, we sometimes need to manipulate them in host endianness. To reduce the chances of making such mistakes again, change to always using a (struct in_addr) instead of a bare in_addr_t or uint32_t to store network endian addresses. This makes it harder to accidentally do arithmetic or comparisons on such addresses as if they were host endian. We introduce a number of IN4_IS_ADDR_*() helpers to make it easier to directly work with struct in_addr values. This has the additional benefit of making the IPv4 and IPv6 paths more visually similar. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* Minor improvements to IPv4 netmask handlingDavid Gibson2022-11-041-2/+4
| | | | | | | | | | | | | | | | There are several minor problems with our parsing of IPv4 netmasks (-n). First, we don't reject nonsensical netmasks like 0.255.0.255. Address this structurally by using prefix length instead of netmask as the primary variable, only converting (and validating) when we need to. This has the added benefit of making some things more uniform with the IPv6 path. Second, when the user specifies a prefix length, we truncate the output from strtol() to an integer, which means we would treat -n 4294967320 as valid (equivalent to 24). Fix types to check for this. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* dhcp: Use tap_udp4_send() helper in dhcp()David Gibson2022-10-191-16/+2
| | | | | | | | | | The IPv4 specific dhcp() manually constructs L2 and IP headers to send its DHCP reply packet, unlike its IPv6 equivalent in dhcpv6.c which uses the tap_udp6_send() helper. Now that we've broaded the parameters to tap_udp4_send() we can use it in dhcp() to avoid some duplicated logic. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tap: Remove unhelpeful vnet_pre optimization from tap_send()David Gibson2022-10-191-1/+1
| | | | | | | | | | | | | | | | Callers of tap_send() can optionally use a small optimization by adding extra space for the 4 byte length header used on the qemu socket interface. tap_ip_send() is currently the only user of this, but this is used only for "slow path" ICMP and DHCP packets, so there's not a lot of value to the optimization. Worse, having the two paths here complicates the interface and makes future cleanups difficult, so just remove it. I have some plans to bring back the optimization in a more general way in future, but for now it's just in the way. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* Add csum_ip4_header() helper to calculate IPv4 header checksumsDavid Gibson2022-10-191-2/+1
| | | | | | | | We calculate IPv4 header checksums in at least two places, in dhcp() and in tap_ip_send. Add a helper to handle this calculation in both places. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* Add csum_udp4() helper for calculating UDP over IPv4 checksumsDavid Gibson2022-10-191-1/+1
| | | | | | | | | | | | | At least two places in passt fill in UDP over IPv4 checksums, although since UDP checksums are optional with IPv4 that just amounts to storing a 0 (in tap_ip_send()) or leaving a 0 from an earlier initialization (in dhcp()). For consistency, add a helper for this "calculation". Just for the heck of it, add the option (compile time disabled for now) to calculate real UDP checksums. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* Move logging functions to a new file, log.cStefano Brivio2022-10-141-0/+1
| | | | | | | | Logging to file is going to add some further complexity that we don't want to squeeze into util.c. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* Make substructures for IPv4 and IPv6 specific context informationDavid Gibson2022-07-301-11/+11
| | | | | | | | | | | | The context structure contains a batch of fields specific to IPv4 and to IPv6 connectivity. Split those out into a sub-structure. This allows the conf_ip4() and conf_ip6() functions, which take the entire context but touch very little of it, to be given more specific parameters, making it clearer what it affects without stepping through the code. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* treewide: Mark constant references as constStefano Brivio2022-03-291-4/+5
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* treewide: Packet abstraction with mandatory boundary checksStefano Brivio2022-03-291-18/+36
| | | | | | | | | | | | | | | | | | | | Implement a packet abstraction providing boundary and size checks based on packet descriptors: packets stored in a buffer can be queued into a pool (without storage of its own), and data can be retrieved referring to an index in the pool, specifying offset and length. Checks ensure data is not read outside the boundaries of buffer and descriptors, and that packets added to a pool are within the buffer range with valid offset and indices. This implies a wider rework: usage of the "queueing" part of the abstraction mostly affects tap_handler_{passt,pasta}() functions and their callees, while the "fetching" part affects all the guest or tap facing implementations: TCP, UDP, ICMP, ARP, NDP, DHCP and DHCPv6 handlers. Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* dhcp: Minimum option length implied by RFC 951 is 60 bytes, not 62Stefano Brivio2022-03-291-3/+5
| | | | | | | In section 3 ("Packet Format"), "vend" is 64 bytes long, minus the magic that's 60 bytes, not 62. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* arp, dhcp: Fix strict aliasing warnings reported by gcc 4.9 with -OfastStefano Brivio2022-02-281-7/+7
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* conf, udp: Introduce basic DNS forwardingStefano Brivio2022-02-211-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | For compatibility with libslirp/slirp4netns users: introduce a mechanism to map, in the UDP routines, an address facing guest or namespace to the first IPv4 or IPv6 address resulting from configuration as resolver. This can be enabled with the new --dns-forward option. This implies that sourcing and using DNS addresses and search lists, passed via command line or read from /etc/resolv.conf, is not bound anymore to DHCP/DHCPv6/NDP usage: for example, pasta users might just want to use addresses from /etc/resolv.conf as mapping target, while not passing DNS options via DHCP. Reflect this in all the involved code paths by differentiating DHCP/DHCPv6/NDP usage from DNS configuration per se, and in the new options --dhcp-dns, --dhcp-search for pasta, and --no-dhcp-dns, --no-dhcp-search for passt. This should be the last bit to enable substantial compatibility between slirp4netns.sh and slirp4netns(1): pass the --dns-forward option from the script too. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Address new clang-tidy warnings from LLVM 13.0.1Stefano Brivio2022-01-301-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | clang-tidy from LLVM 13.0.1 reports some new warnings from these checkers: - altera-unroll-loops, altera-id-dependent-backward-branch: ignore for the moment being, add a TODO item - bugprone-easily-swappable-parameters: ignore, nothing to do about those - readability-function-cognitive-complexity: ignore for the moment being, add a TODO item - altera-struct-pack-align: ignore, alignment is forced in protocol headers - concurrency-mt-unsafe: ignore for the moment being, add a TODO item Fix bugprone-implicit-widening-of-multiplication-result warnings, though, that's doable and they seem to make sense. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Drop <linux/ipv6.h> include, carry own ipv6hdr and opt_hdr definitionsStefano Brivio2022-01-261-2/+0
| | | | | | | This is the only remaining Linux-specific include -- drop it to avoid clang-tidy warnings and to make code more portable. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Add cppcheck target, test, and address resulting warningsStefano Brivio2021-10-211-6/+7
| | | | | | | ...mostly false positives, but a number of very relevant ones too, in tcp_get_sndbuf(), tcp_conn_from_tap(), and siphash PREAMBLE(). Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Fix build with gcc 7, use std=c99, enable some more Clang checkersStefano Brivio2021-10-211-8/+9
| | | | | | | | | | | | | | Unions and structs, you all have names now. Take the chance to enable bugprone-reserved-identifier, cert-dcl37-c, and cert-dcl51-cpp checkers in clang-tidy. Provide a ffsl() weak declaration using gcc built-in. Start reordering includes, but that's not enough for the llvm-include-order checker yet. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Add clang-tidy Makefile target and test, take care of warningsStefano Brivio2021-10-201-2/+1
| | | | | | | Most are just about style and form, but a few were actually serious mistakes (NDP-related). Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Shrink binary size by dropping static initialisersStefano Brivio2021-10-051-7/+17
| | | | | | ...from 11MiB to 155KiB for 'make avx2', 95KiB with -Os and stripped. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* conf, dhcp, ndp: Fix message about default MTU, make NDP consistentStefano Brivio2021-09-091-2/+1
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* dhcp: Send option 121 if the default gateway is not on the assigned subnetStefano Brivio2021-09-011-0/+11
| | | | | | | | This enables CirrOS with udhcpc to set up a route to a gateway that's not on the assigned subnet, together with: https://bugs.launchpad.net/cirros/+bug/1190372 Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt, pasta: Introduce command-line options and port re-mappingStefano Brivio2021-09-011-0/+9
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* checksum: Introduce AVX2 implementation, unify helpersStefano Brivio2021-07-261-1/+2
| | | | | | | | | | | | | | | | | | | Provide an AVX2-based function using compiler intrinsics for TCP/IP-style checksums. The load/unpack/add idea and implementation is largely based on code from BESS (the Berkeley Extensible Software Switch) licensed as 3-Clause BSD, with a number of modifications to further decrease pipeline stalls and to minimise cache pollution. This speeds up considerably data paths from sockets to tap interfaces, decreasing overhead for checksum computation, with 16-64KiB packet buffers, from approximately 11% to 7%. The rest is just syscalls at this point. While at it, provide convenience targets in the Makefile for avx2, avx2_debug, and debug targets -- these simply add target-specific CFLAGS to the build. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* dhcp: Set MTU option (26) to 65520 bytesStefano Brivio2021-07-211-0/+1
| | | | | | | | This value should work for all tap-like interfaces and is rather convenient for performance testing. It will be configurable later on. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Add PASTA mode, major reworkStefano Brivio2021-07-171-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PASTA (Pack A Subtle Tap Abstraction) provides quasi-native host connectivity to an otherwise disconnected, unprivileged network and user namespace, similarly to slirp4netns. Given that the implementation is largely overlapping with PASST, no separate binary is built: 'pasta' (and 'passt4netns' for clarity) both link to 'passt', and the mode of operation is selected depending on how the binary is invoked. Usage example: $ unshare -rUn # echo $$ 1871759 $ ./pasta 1871759 # From another terminal # udhcpc -i pasta0 2>/dev/null # ping -c1 pasta.pizza PING pasta.pizza (64.190.62.111) 56(84) bytes of data. 64 bytes from 64.190.62.111 (64.190.62.111): icmp_seq=1 ttl=255 time=34.6 ms --- pasta.pizza ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 34.575/34.575/34.575/0.000 ms # ping -c1 spaghetti.pizza PING spaghetti.pizza(2606:4700:3034::6815:147a (2606:4700:3034::6815:147a)) 56 data bytes 64 bytes from 2606:4700:3034::6815:147a (2606:4700:3034::6815:147a): icmp_seq=1 ttl=255 time=29.0 ms --- spaghetti.pizza ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 28.967/28.967/28.967/0.000 ms This entails a major rework, especially with regard to the storage of tracked connections and to the semantics of epoll(7) references. Indexing TCP and UDP bindings merely by socket proved to be inflexible and unsuitable to handle different connection flows: pasta also provides Layer-2 to Layer-2 socket mapping between init and a separate namespace for local connections, using a pair of splice() system calls for TCP, and a recvmmsg()/sendmmsg() pair for UDP local bindings. For instance, building on the previous example: # ip link set dev lo up # iperf3 -s $ iperf3 -c ::1 -Z -w 32M -l 1024k -P2 | tail -n4 [SUM] 0.00-10.00 sec 52.3 GBytes 44.9 Gbits/sec 283 sender [SUM] 0.00-10.43 sec 52.3 GBytes 43.1 Gbits/sec receiver iperf Done. epoll(7) references now include a generic part in order to demultiplex data to the relevant protocol handler, using 24 bits for the socket number, and an opaque portion reserved for usage by the single protocol handlers, in order to track sockets back to corresponding connections and bindings. A number of fixes pertaining to TCP state machine and congestion window handling are also included here. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* dhcp, ndp, dhcpv6: Support for multiple DNS servers, search listStefano Brivio2021-05-211-3/+94
| | | | | | | | | | | Add support for a variable amount of DNS servers, including zero, from /etc/resolv.conf, in DHCP, NDP and DHCPv6 implementations. Introduce support for domain search list for DHCP (RFC 3397), NDP (RFC 8106), and DHCPv6 (RFC 3646), also sourced from /etc/resolv.conf. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* dhcp: Increase lease time to maximum allowed valueStefano Brivio2021-05-211-5/+5
| | | | | | ...to make things simpler at least for the moment being. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* dhcp: Remove left-over comment about "forced" optionsStefano Brivio2021-03-261-2/+1
| | | | | | | For simplicity, we just send all available options, so there's no distinction between forced and requested option anymore. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Run in background, add message logging with severitiesStefano Brivio2021-03-181-5/+5
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Assorted fixes from "fresh eyes" reviewStefano Brivio2021-02-211-6/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A bunch of fixes not worth single commits at this stage, notably: - make buffer, length parameter ordering consistent in ARP, DHCP, NDP handlers - strict checking of buffer, message and option length in DHCP handler (a malicious client could have easily crashed it) - set up forwarding for IPv4 and IPv6, and masquerading with nft for IPv4, from demo script - get rid of separate slow and fast timers, we don't save any overhead that way - stricter checking of buffer lengths as passed to tap handlers - proper dequeuing from qemu socket back-end: I accidentally trashed messages that were bundled up together in a single tap read operation -- the length header tells us what's the size of the next frame, but there's no apparent limit to the number of messages we get with one single receive - rework some bits of the TCP state machine, now passive and active connection closes appear to be robust -- introduce a new FIN_WAIT_1_SOCK_FIN state indicating a FIN_WAIT_1 with a FIN flag from socket - streamline TCP option parsing routine - track TCP state changes to stderr (this is temporary, proper debugging and syslogging support pending) - observe that multiplying a number by four might very well change its value, and this happens to be the case for the data offset from the TCP header as we check if it's the same as the total length to find out if it's a duplicated ACK segment - recent estimates suggest that the duration of a millisecond is closer to a million nanoseconds than a thousand of them, this trend is now reflected into the timespec_diff_ms() convenience routine Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: New design and implementation with native Layer 4 socketsStefano Brivio2021-02-161-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a reimplementation, partially building on the earlier draft, that uses L4 sockets (SOCK_DGRAM, SOCK_STREAM) instead of SOCK_RAW, providing L4-L2 translation functionality without requiring any security capability. Conceptually, this follows the design presented at: https://gitlab.com/abologna/kubevirt-and-kvm/-/blob/master/Networking.md The most significant novelty here comes from TCP and UDP translation layers. In particular, the TCP state and translation logic follows the intent of being minimalistic, without reimplementing a full TCP stack in either direction, and synchronising as much as possible the TCP dynamic and flows between guest and host kernel. Another important introduction concerns addressing, port translation and forwarding. The Layer 4 implementations now attempt to bind on all unbound ports, in order to forward connections in a transparent way. While at it: - the qemu 'tap' back-end can't be used as-is by qrap anymore, because of explicit checks now introduced in qemu to ensure that the corresponding file descriptor is actually a tap device. For this reason, qrap now operates on a 'socket' back-end type, accounting for and building the additional header reporting frame length - provide a demo script that sets up namespaces, addresses and routes, and starts the daemon. A virtual machine started in the network namespace, wrapped by qrap, will now directly interface with passt and communicate using Layer 4 sockets provided by the host kernel. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Add IPv6 and NDP support, further fixes for IPv4 CTStefano Brivio2021-02-161-0/+1
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* merd: Rename to PASSTStefano Brivio2021-02-161-3/+3
| | | | | | Plug A Simple Socket Transport. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* merd: ARP and DHCP handlers, connection tracking fixesStefano Brivio2021-02-161-0/+218
With this, merd provides a fully functional IPv4 environment to guests, requiring a single capability, CAP_NET_RAW. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>