passt - Plug A Simple Socket Transport

	Commit message (Collapse)	Author	Age	Files	Lines
*	tap: Split tap specific and L2 (ethernet) headers	David Gibson	2024-05-02	1	-9/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	In some places (well, actually only UDP now) we use struct tap_hdr to represent both tap backend specific and L2 ethernet headers. Handling these together seemed like a good idea at the time, but Laurent's changes in the TCP code working towards vhost-user support suggest that treating them separately is more useful, more often. Alter struct tap_hdr to represent only the TAP backend specific headers. Updated related helpers and the UDP code to match. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	udp: Correctly look up outbound socket with port remappings	David Gibson	2024-04-25	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit bb9bf0bb ("tcp, udp: Don't precompute port remappings in epoll references") changed the epoll reference for UDP sockets to include the bound port as seen by the socket itself, rather than the bound port it would be translated to on the guest side. As a side effect, it also means that udp_tap_map[] is indexed by the bound port on the host side, rather than on the guest side. This is consistent and a good idea, however we forgot to account for it when finding the correct outgoing socket for packets originating in the guest. This means that if forwarding UDP inbound with a port number change, reply packets would be misdirected. Fix this by applying the reverse mapping before looking up the socket in udp_tap_handler(). While we're at it, use 'port' directly instead of 'uref.port' in udp_sock_init(). Those now always have the same value - failing to realise that is the same error as above. Reported-by: Laurent Jacquot <jk@lutty.net> Link: https://bugs.passt.top/show_bug.cgi?id=87 Fixes: bb9bf0bb8f57 ("tcp, udp: Don't precompute port remappings in epoll references") Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	udp: Translate source address of resolver only for DNS remapped queries	Stefano Brivio	2024-03-18	1	-6/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Paul reports that if pasta is configured with --dns-forward, and the container queries a resolver which is configured on the host directly, without using the address given for --dns-forward, we'll translate the source address of the response pretending it's coming from the address passed as --dns-forward, and the client will discard the reply. That is, $ cat /etc/resolv.conf 198.51.100.1 $ pasta --config-net --dns-forward 192.0.2.1 nslookup passt.top will not work, because we change the source address of the reply from 198.51.100.1 to 192.0.2.1. But the client contacted 198.51.100.1, and it's from that address that it expects an answer. Add a PORT_DNS_FWD flag for tap-facing ports, which is triggered by activity in the opposite direction as the other flags. If the tap-facing port was seen sending a DNS query that was remapped, we'll remap the source address of the response, otherwise we'll leave it unaffected. Reported-by: Paul Holzinger <pholzing@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	tap: Rename tap_iov_{base,len}	David Gibson	2024-03-14	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \|	These two functions are typically used to calculate values to go into the iov_base and iov_len fields of a struct iovec. They don't have to be used for that, though. Rename them in terms of what they actually do: calculate the base address and total length of the complete frame, including both L2 and tap specific headers. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	tap: Extend tap_send_frames() to allow multi-buffer frames	David Gibson	2024-03-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	tap_send_frames() takes a vector of buffers and requires exactly one frame per buffer. We have future plans where we want to have multiple buffers per frame in some circumstances, so extend tap_send_frames() to take the number of buffers per frame as a parameter. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Improve comment to rembufs calculation] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	udp: Use existing helper for UDP checksum on inbound IPv6 packets	David Gibson	2024-03-13	1	-4/+1
\| \| \| \| \| \| \| \| \| \| \|	Currently we open code the calculation of the UDP checksum in udp_update_hdr6(). We calling a helper to handle the IPv6 pseudo-header, and preset the checksum field to 0 so an uninitialised value doesn't get folded in. We already have a helper to do this: csum_udp6() which we use in some slow paths. Use it here as well. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	udp: Avoid unnecessary pointer in udp_update_hdr4()	David Gibson	2024-03-13	1	-9/+9
\| \| \| \| \| \| \| \| \|	We carry around the source address as a pointer to a constant struct in_addr. But it's silly to carry around a 4 or 8 byte pointer to a 4 byte IPv4 address. Just copy the IPv4 address around by value. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	udp: Re-order udp_update_hdr[46] for clarity and brevity	David Gibson	2024-03-13	1	-26/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The order of things in these functions is a bit odd for historical reasons. We initialise some IP header fields early, the more later after making some tests. Likewise we declare some variables without initialisation, but then unconditionally set them to values we could calculate at the start of the function. Previous cleanups have removed the reasons for some of these choices, so reorder for clarity, and where possible move the first assignment into an initialiser. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	udp: Pass data length explicitly to to udp_update_hdr[46]	David Gibson	2024-03-13	1	-12/+16
\| \| \| \| \| \| \| \| \| \| \| \|	These functions take an index to the L2 buffer whose header information to update. They use that for two things: to locate the buffer pointer itself, and to retrieve the length of the received message from the paralllel udp[46]_l2_mh_sock array. The latter is arguably a failure to separate concerns. Change these functions to explicitly take a buffer pointer and payload length as parameters. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	udp: Consistent port variable names in udp_update_hdr[46]	David Gibson	2024-03-13	1	-18/+18
\| \| \| \| \| \| \| \| \|	In these functions we have 'dstport' for the destination port, but 'src_port' for the source port. Change the latter to 'srcport' for consistency. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	udp: Refactor udp_sock[46]_iov_init()	David Gibson	2024-03-13	1	-52/+50
\| \| \| \| \| \| \| \| \| \| \| \|	Each of these functions have 3 essentially identical loops in a row. Merge the loops into a single common udp_sock_iov_init() function, calling udp_sock[46]_iov_init_one() helpers to initialize each "slot" in the various parallel arrays. This is slightly neater now, and more naturally allows changes we want to make where more initialization will become common between IPv4 and IPv6. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	tap: make tap_update_mac() generic	Laurent Vivier	2024-03-06	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	Use ethhdr rather than tap_hdr. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-ID: <20240303135114.1023026-9-lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	checksum: introduce functions to compute the header part checksum for TCP/UDP	Laurent Vivier	2024-03-06	1	-8/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The TCP and UDP checksums are computed using the data in the TCP/UDP payload but also some informations in the IP header (protocol, length, source and destination addresses). We add two functions, proto_ipv4_header_psum() and proto_ipv6_header_psum(), to compute the checksum of the IP header part. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-ID: <20240303135114.1023026-8-lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	checksum: use csum_ip4_header() in udp.c and tcp.c	Laurent Vivier	2024-03-06	1	-18/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We can find the same function to compute the IPv4 header checksum in tcp.c, udp.c and tap.c Use the function defined for tap.c, csum_ip4_header(), but with the code used in tcp.c and udp.c as it doesn't need a fully initialiazed IPv4 header, only protocol, tot_len, saddr and daddr. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-ID: <20240303135114.1023026-7-lvivier@redhat.com> [dwg: Fix weird cppcheck regression; it appears to be a problem in pre-existing code, but somehow this patch is exposing it] Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	udp: little cleanup in udp_update_hdrX() to prepare future changes	Laurent Vivier	2024-03-06	1	-20/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in udp_update_hdr4(): Assign the source address to src, either b->s_in.sin_addr, c->ip4.dns_match or c->ip4.gw and then set b->iph.saddr to src->s_addr. in udp_update_hdr6(): Assign the source address to src, either b->s_in6.sin6_addr, c->ip6.dns_match, c->ip6.gw or c->ip6.addr_ll. Assign the destination to dst, either c->ip6.addr_seen or &c->ip6.addr_ll_seen. Then set dst to b->ip6h.daddr and src to b->ip6h.saddr. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-ID: <20240303135114.1023026-6-lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	util: move IP stuff from util.[ch] to ip.[ch]	Laurent Vivier	2024-03-06	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Introduce ip.[ch] file to encapsulate IP protocol handling functions and structures. Modify various files to include the new header ip.h when it's needed. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-ID: <20240303135114.1023026-5-lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	fwd: Rename port_fwd.[ch] and their contents	David Gibson	2024-02-29	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \|	Currently port_fwd.[ch] contains helpers related to port forwarding, particular automatic port forwarding. We're planning to allow much more flexible sorts of forwarding, including both port translation and NAT based on the flow table. This will subsume the existing port forwarding logic, so rename port_fwd.[ch] to fwd.[ch] with matching updates to all the names within. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	tcp, udp: Don't precompute port remappings in epoll references	David Gibson	2024-02-29	1	-6/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The epoll references for both TCP listening sockets and UDP sockets includes a port number. This gives the destination port that traffic to that socket will be sent to on the other side. That will usually be the same as the socket's bound port, but might not if the -t, -u, -T or -U options are given with different original and forwarded port numbers. As we move towards a more flexible forwarding model for passt, it's going to become possible for that destination port to vary depending on more things (for example the source or destination address). So, it will no longer make sense to have a fixed value for a listening socket. Change to simpler semantics where this field in the reference gives the bound port of the socket. We apply the translations to the correct destination port later on, when we're actually forwarding. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	inany: Provide more conveniently typed constants for special addresses	David Gibson	2024-02-29	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Our inany_addr type is used in some places to represent either IPv4 or IPv6 addresses, and we plan to use it more widely. We don't yet provide constants of this type for special addresses (loopback and "any"). Add some of these, both the IPv4 and IPv6 variants of those addresses, but typed as union inany_addr. To avoid actually adding more things to .data we can use some macros and casting to overlay the IPv6 versions of these with the standard library's in6addr_loopback and in6addr_any. For the IPv4 versions we need to create new constant globals. For complicated historical reasons, the standard library doesn't provide constants for IPv4 loopback and any addresses as struct in_addr. It just has macros of type in_addr_t == uint32_t, which has some gotchas w.r.t. endianness. We can use some more macros to address this lack, using macros to effectively create these IPv4 constants as pieces of the inany constants above. We use this last to avoid some awkward temporary variables just used to get an address of an IPv4 loopback address. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	udp: Remove unnecessary test for unspecified addr_out	David Gibson	2024-02-29	1	-4/+2
\| \| \| \| \| \| \| \| \| \|	If the configured output address is unspecified, we don't set the bind address to it when creating a new socket in udp_tap_handler(). That sounds sensible, but what we're leaving the bind address as is, exactly, the unspecified address, so this test makes no difference. Remove it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	udp: Fix incorrect usage of IPv6 state in IPv4 path	David Gibson	2024-02-29	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	When forwarding IPv4 packets in udp_tap_handler(), we incorrectly use an IPv6 address test on our IPv4 address (which could cause an out of bounds access), and possibly set our bind interface to the IPv6 interface based on it. Adjust to correctly look at the IPv4 address and IPv4 interface. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	udp: Small streamline to udp_update_hdr4()	David Gibson	2024-02-29	1	-8/+9
\| \| \| \| \| \| \| \| \| \|	Streamline the logic here slightly, by introducing a 'src' temporary for brevity. We also transform the logic for setting/clearing PORT_LOOPBACK. This makes udp_update_hdr4() more closely match the corresponding logic from udp_update_udp6(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	udp: Set pif in epoll reference for ephemeral host sockets	David Gibson	2024-02-29	1	-2/+9
\| \| \| \| \| \| \| \| \| \| \| \| \|	The udp_epoll_ref contains a field for the pif to which the socket belongs. We fill this in for permanent sockets created with udp_sock_init() and for spliced sockets, however, we omit it for ephemeral sockets created for tap originated flows. This is a bug, although we currently get away with it, because we don't consult that field for such flows. Correctly fill it in. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	udp: Don't attempt to translate a 0.0.0.0 source address	David Gibson	2024-02-29	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If an incoming packet has a source address of 0.0.0.0 we translate that to the gateway address. This doesn't really make sense, because we have no way to do a reverse translation for reply packets. Certain UDP protocols do use an unspecified source address in some circumstances (e.g. DHCP). These generally either require no reply, a multicast reply, or provide a suitable reply address by other means. In none of those cases does translating it in passt/pasta make sense. The best we can really do here is just leave it as is. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	treewide: Use sa_family_t for address family variables	David Gibson	2024-02-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Sometimes we use sa_family_t for variables and parameters containing a socket address family, other times we use a plain int. Since sa_family_t is what's actually used in struct sockaddr and friends, standardise on that. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	udp: Fix 16-bit overflow in udp_invert_portmap()2024_02_20.1e6f92b	David Gibson	2024-02-20	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The code in udp_invert_portmap() is written based on an incorrect understanding of C's (arcane) integer promotion rules. We calculate '(in_port_t)i + delta' expecting the result to be of type in_port_t (16 bits). However "small integer types" (those narrower than 'int') are always promoted to int for expressions, meaning this calculation can overrun the rdelta[] array. Fix this, and use a new intermediate for the index, to make it very clear what it's type is. We also change i to unsigned, to avoid any possible confusion from mixing signed and unsigned types. Link: https://bugs.passt.top/show_bug.cgi?id=80 Reported-by: Laurent Jacquot <jk@lutty.net> Suggested-by: Laurent Jacquot <jk@lutty.net> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	udp: Assertion in udp_invert_portmap() can be calculated at compile time	David Gibson	2024-02-20	1	-1/+2
\| \| \| \| \| \| \| \|	All the values in this ASSERT() are known at compile time, so this can be converted to a static_assert(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	udp: udp_sock_init_ns() partially duplicats udp_port_rebind_outbound()	David Gibson	2024-02-14	1	-48/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Usually automatically forwarded UDP outbound ports are set up by udp_port_rebind_outbound() called from udp_timer(). However, the very first time they're created and bound is by udp_sock_init_ns() called from udp_init(). udp_sock_init_ns() is essentially an unnecessary cut down version of udp_port_rebind_outbound(), so we can jusat remove it. Doing so does require moving udp_init() below udp_port_rebind_outbound()'s definition. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	udp: Don't prematurely (and incorrectly) set up automatic inbound forwards	David Gibson	2024-02-14	1	-17/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For automated inbound port forwarding in pasta mode we scan bound ports within the guest namespace via /proc and bind matching ports on the host to listen for packets. For UDP this is usually handled by udp_timer() which calls port_fwd_scan_udp() followed by udp_port_rebind(). However there's one initial scan before the the UDP timer is started: we call port_fwd_scan_udp() from port_fwd_init(), and actually bind the resulting ports in udp_sock_init_init() called from udp_init(). Unfortunately, the version in udp_sock_init_init() isn't correct. It unconditionally opens a new socket for every forwarded port, even if a socket has already been explicit created with the -u option. If the explicitly forwarded ports have particular configuration (such as a specific bound address address, or one implied by the -o option) those will not be replicated in the new socket. We essentially leak the original correctly configured socket, replacing it with one which might not be right. We could make udp_sock_init_init() use udp_port_rebind() to get that right, but there's actually no point doing so: * The initial bind was introduced by ccf6d2a7b48d ("udp: Actually bind detected namespace ports in init namespace") at which time we didn't periodically scan for bound UDP ports. Periodic scanning was introduced in 457ff122e ("udp,pasta: Periodically scan for ports to automatically forward") making the bind from udp_init() redundant. * At the time of udp_init(), programs in the guest namespace are likely not to have started yet (unless attaching a pre-existing namespace) so there's likely not anything to scan for anyway. So, simply remove the initial, broken socket create/bind, allowing automatic port forwards to be created the first time udp_timer() runs. Reported-by: Laurent Jacquot <jk@lutty.net> Suggested-by: Laurent Jacquot <jk@lutty.net> Link: https://bugs.passt.top/show_bug.cgi?id=79 Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	treewide: Standardise on 'now' for current timestamp variables	David Gibson	2024-01-22	1	-8/+8
\| \| \| \| \| \| \| \| \|	In a number of places we pass around a struct timespec representing the (more or less) current time. Sometimes we call it 'now', and sometimes we call it 'ts'. Standardise on the more informative 'now'. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	treewide: Make a bunch of pointer variables pointers to const	David Gibson	2024-01-16	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	Sufficiently recent cppcheck (I'm using 2.13.0) seems to have added another warning for pointer variables which could be pointer to const but aren't. Use this to make a bunch of variables const pointers where they previously weren't for no particular reason. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	util: Make sock_l4() treat empty string ifname like NULL	David Gibson	2023-12-27	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \|	sock_l4() takes NULL for ifname if you don't want to bind the socket to a particular interface. However, for a number of the callers, it's more natural to use an empty string for that case. Change sock_l4() to accept either NULL or an empty string equivalently, and simplify some callers using that change. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	treewide: Avoid in_addr_t	David Gibson	2023-12-27	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	IPv4 addresses can be stored in an in_addr_t or a struct in_addr. The former is just a type alias to a 32-bit integer, so doesn't really give us any type checking. Therefore we generally prefer the structure, since we mostly want to treat IP address as opaque objects. Fix a few places where we still use in_addr_t, but can just as easily use struct in_addr. Note there are still some uses of in_addr_t in conf.c, but those are justified: since they're doing prefix calculations, they actually need to look at the internals of the address as an integer. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	treewide: Use IN4ADDR_LOOPBACK_INIT more widely	David Gibson	2023-12-27	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	We already define IN4ADDR_LOOPBACK_INIT to initialise a struct in_addr to the loopback address without delving into its internals. However there are some places we don't use it, and explicitly look at the internal structure of struct in_addr, which we generally want to avoid. Use the define more widely to avoid that. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	port_fwd, util: Don't bind UDP ports with opposite-side bound TCP ports	Stefano Brivio	2023-11-22	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When pasta periodically scans bound ports and binds them on the other side in order to forward traffic, we bind UDP ports for corresponding TCP port numbers, too, to support protocols and applications such as iperf3 which use UDP port numbers matching the ones used by the TCP data connection. If we scan UDP ports in order to bind UDP ports, we skip detection of the UDP ports we already bound ourselves, to avoid looping back our own ports. Same with scanning and binding TCP ports. But if we scan for TCP ports in order to bind UDP ports, we need to skip bound TCP ports too, otherwise, as David pointed out: - we find a bound TCP port on side A, and bind the corresponding TCP and UDP ports on side B - at the next periodic scan, we find that UDP port bound on side B, and we bind the corresponding UDP port on side A - at this point, we unbind that UDP port on side B: we would otherwise loop back our own port. To fix this, we need to avoid binding UDP ports that we already bound, on the other side, as a consequence of finding a corresponding bound TCP port. Reproducing this issue is straightforward: ./pasta -- iperf3 -s # Wait one second, then from another terminal: iperf3 -c ::1 -u Reported-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp> Analysed-by: David Gibson <david@gibson.dropbear.id.au> Fixes: 457ff122e33c ("udp,pasta: Periodically scan for ports to automatically forward") Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	udp,pasta: Periodically scan for ports to automatically forward	David Gibson	2023-11-19	1	-0/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	pasta supports automatic port forwarding, where we look for listening sockets in /proc/net (in both namespace and outside) and establish port forwarding to match. For TCP we do this scan both at initial startup, then periodically thereafter. For UDP however, we currently only scan at start. So unlike TCP we won't update forwarding to handle services that start after pasta has begun. There's no particular reason for that, other than that we didn't implement it. So, remove that difference, by scanning for new UDP forwards periodically too. The logic is basically identical to that for TCP, but it needs some changes to handle the mildly different data structures in the UDP case. Link: https://bugs.passt.top/show_bug.cgi?id=45 Link: https://github.com/rootless-containers/rootlesskit/issues/383 Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	udp: Remove socket from udp_{tap,splice}_map when timed out	David Gibson	2023-11-07	1	-5/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We save sockets bound to particular ports in udp_{tap,splice}_map for reuse later. If they're not used for a time, we time them out and close them. However, when that happened, we weren't actually removing the fds from the relevant map. That meant that later interactions on the same port could get a stale fd from the map. The stale fd might be closed, leading to unexpected EBADF errors, or it could have been re-used by a completely different socket bound to a different port, which could lead to us incorrectly forwarding packets. Reported-by: Chris Kuhn <kuhnchris@kuhnchris.eu> Reported-by: Jay <bugs.passt.top@bitsbetwixt.com> Link: https://bugs.passt.top/show_bug.cgi?id=57 Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	udp: Consistently use -1 to indicate un-opened sockets in maps	David Gibson	2023-11-07	1	-5/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	udp uses the udp_tap_map, udp_splice_ns and udp_splice_init tables to keep track of already opened sockets bound to specific ports. We need a way to indicate entries where a socket hasn't been opened, but the code isn't consistent if this is indicated by a 0 or a -1: * udp_splice_sendfrom() and udp_tap_handler() assume that 0 indicates an unopened socket * udp_sock_init() fills in -1 for a failure to open a socket * udp_timer_one() is somewhere in between, treating only strictly positive fds as valid -1 (or, at least, negative) is really the correct choice here, since 0 is a theoretically valid fd value (if very unlikely in practice). Change to use that consistently throughout. The table does need to be initialised to all -1 values before any calls to udp_sock_init() which can happen from conf_ports(). Because C doesn't make it easy to statically initialise non zero values in large tables, this does require a somewhat awkward call to initialise the table from conf(). This is the best approach I could see for the short term, with any luck it will go away at some point when those socket tables are replaced by a unified flow table. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	pif: Pass originating pif to tap handler functions	David Gibson	2023-11-07	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \|	For now, packets passed to the various *_tap_handler() functions always come from the single "tap" interface. We want to allow the possibility to broaden that in future. As preparation for that, have the code in tap.c pass the pif id of the originating interface to each of those handler functions. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	pif: Record originating pif in listening socket refs	David Gibson	2023-11-07	1	-11/+12
\| \| \| \| \| \| \| \| \| \|	For certain socket types, we record in the epoll ref whether they're sockets in the namespace, or on the host. We now have the notion of "pif" to indicate what "place" a socket is associated with, so generalise the simple one-bit 'ns' to a pif id. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	udp: Clean up ref initialisation in udp_sock_init()	David Gibson	2023-11-07	1	-5/+2
\| \| \| \| \| \| \| \| \|	udp_sock_init() has a number of paths that initialise uref differently. However some of the fields are initialised the same way in all of them. Move those fields into the original initialiser to save a few lines. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	cppcheck: Make many pointers const	David Gibson	2023-10-04	1	-2/+2
\| \| \| \| \| \| \| \| \|	Newer versions of cppcheck (as of 2.12.0, at least) added a warning for pointers which could be declared to point at const data, but aren't. Based on that, make many pointers throughout the codebase const. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	udp, tap: Correctly advance through packets in udp_tap_handler()	David Gibson	2023-09-08	1	-7/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In both tap4_handler() and tap6_handler(), once we've sorted incoming l3 packets into "sequences", we then step through all the packets in each DUP sequence calling udp_tap_handler(). Or so it appears. In fact, udp_tap_handler() doesn't take an index and always starts with packet 0 of the sequence, even if called repeatedly. It appears to be written with the idea that the struct pool is a queue, from which it consumes packets as it processes them, but that's not how the pool data structure works. Correct this by adding an index parameter to udp_tap_handler() and altering the loops in tap.c to step through the pool properly. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	tcp, udp: Don't pre-fill IPv4 destination address in headers	David Gibson	2023-08-22	1	-7/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Because packets sent on the tap interface will always be going to the guest/namespace, we more-or-less know what address they'll be going to. So we pre-fill this destination address in our header buffers for IPv4. We can't do the same for IPv6 because we could need either the global or link-local address for the guest. In future we're going to want more flexibility for the destination address, so this pre-filling will get in the way. Change the flow so we always fill in the IPv4 destination address for each packet, rather than prefilling it from proto_update_l2_buf(). In fact for TCP we already redundantly filled the destination for each packet anyway. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	tcp, udp: Don't include destination address in partially precomputed csums	David Gibson	2023-08-22	1	-11/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We partially prepopulate IP and TCP header structures including, amongst other things the destination address, which for IPv4 is always the known address of the guest/namespace. We partially precompute both the IPv4 header checksum and the TCP checksum based on this. In future we're going to want more flexibility with controlling the destination for IPv4 (as we already do for IPv6), so this precomputed value gets in the way. Therefore remove the IPv4 destination from the precomputed checksum and fold it into the checksum update when we actually send a packet. Doing this means we no longer need to recompute those partial sums when the destination address changes ({tcp,udp}_update_l2_buf()) and instead the computation can be moved to compile time. This means while we perform slightly more computations on each packet, we slightly reduce the amount of memory we need to access. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	tap: Pass source address to protocol handler functions	David Gibson	2023-08-22	1	-6/+8
\| \| \| \| \| \| \| \| \| \| \|	The tap code passes the IPv4 or IPv6 destination address of packets it receives to the protocol specific code. Currently that protocol code doesn't use the source address, but we want it to in future. So, in preparation, pass the IPv4/IPv6 source address of tap packets to those functions as well. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	epoll: Tiny cleanup to udp_sock_handler()	David Gibson	2023-08-13	1	-1/+1
\| \| \| \| \| \| \| \| \|	Move the test for c->no_udp into the function itself, rather than in the dispatching switch statement to better localize the UDP specific logic, and make for greated consistency with other handler functions. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	epoll: Generalize epoll_ref to cover things other than sockets	David Gibson	2023-08-13	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The epoll_ref type includes fields for the IP protocol of a socket, and the socket fd. However, we already have a few things in the epoll which aren't protocol sockets, and we may have more in future. Rename these fields to an abstract "fd type" and file descriptor for more generality. Similarly, rather than using existing IP protocol numbers for the type, introduce our own number space. For now these just correspond to the supported protocols, but we'll expand on that in future. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	util: Make ns_enter() a void function and report setns() errors	David Gibson	2023-08-04	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ns_enter() returns an integer... but it's always zero. If we actually fail the function doesn't return. Therefore it makes more sense for this to be a function returning void, and we can remove the cases where we pointlessly checked its return value. In addition ns_enter() is usually called from an ephemeral thread created by NS_CALL(). That means that the exit(EXIT_FAILURE) there usually won't be reported (since NS_CALL() doesn't wait() for the thread). So, use die() instead to print out some information in the unlikely event that our setns() here does fail. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	Use C11 anonymous members to make poll refs less verbose to use	David Gibson	2023-08-04	1	-27/+23
\| \| \| \| \| \| \| \| \| \| \| \|	union epoll_ref has a deeply nested set of structs and unions to let us subdivide it into the various different fields we want. This means that referencing elements can involve an awkward long string of intermediate fields. Using C11 anonymous structs and unions lets us do this less clumsily. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>